- 


1 





= 
© 
g 
3 z = 
= £ 5 
> aw i 
A ay =F 
mo Sh E 
Sini L 
> al S 
> T = 
\ = 
ae “ 
= od 

Ow 
i E 





Strand ™ 


New Concepts in Parallel Programming 


Ian Foster 
Argonne National Laboratory 


Stephen Taylor 


California Institute of Technology 





PRENTICE HALL 
Englewood Cliffs, New Jersey 07632 


Library of Congress Cataloging-in-Publication Data 


Foster, Ian 
Strand : new concepts in parallel programming. 
Bibliography: p. 
Includes index. 
1. Strand (Computer program language) 2. Parallel 
rogramming (Computer science) I. Taylor, Stephen II. Title. 
QA76.73.S77F67 1990 005 .13’3 89-8769 
ISBN 0-13-850587-X 


Editorial/production supervision: Kathleen Schiaparelli 
Cover design: Lundgren Graphics 

Cover cartoon: Don Martinetti 

Manufacturing buyer: Bob Anderson 


The author and publisher of this book have used their best efforts in preparing this book. These efforts 
include the development, research, and testing of the theories and programs to determine their effectiveness. 
The author and publisher make no warranty of any kind, expressed or implied, with regard to these programs 
or the documentation contained in this book. The author and publisher shall not be liable in any event for 
incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or 
use of these programs. 


STRAND 88™ and Strand™ are trademarks of Artificial Intelligence Limited. 
Strand Software Technologies is a Division of Artificial Intelligence Limited. 


The opening quote is taken from F. P. Brooks, The Mythical Man-Month, © 1975, 
Addison-Wesley Publishing Co., Inc., Reading, Massachusetts. Reprinted with permission. 


© 1990 by Prentice-Hall, Inc. 
A Division of Simon & Schuster 
Englewood Cliffs, New Jersey 07632 





All rights reserved. No part of this book may be 
reproduced, in any form or by any means, 
without permission in writing from the publisher. 


Printed in the United States of America 


10 9 8 765 43 2 1 


ISBN 0-13-85058?-X 


Prentice-Hall International (UK) Limited, London 
Prentice-Hall of Australia Pty. Limited, Sydney 
Prentice-Hall Canada Inc., Toronto 

Prentice-Hall Hispanoamericana, S.A., Mexico 
Prentice-Hall of India Private Limited, New Delhi 
Prentice-Hall of Japan, Inc., Tokyo 

Simon & Schuster Asia Pte. Ltd., Singapore 

Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro 


To our parents 


Contents 


1 Introducing Strand 


1.1 An Analogy ......... 2.0.0.0. 0 2 eee ee ee 
1.1.1 Problem Solving ................ 0.004 eee 
1.1.2 Further Observations...................04. 
1.1.3 Summary ......... 0.000. eee ee ee 

1.2 Tying Concepts Together ............0 00. eee eee 
1.2.1 Design Methodology ...................00.. 
1.2.2 Performance Enhancement and Extensibility ........ 
1.2.3 Concluding Remarks...................... 

1.3 An Overview of the Book ............... 2.000000. 


I Basic Concepts 


2 Strand Programming 


2.1 Denoting Data ........... 2... 0.0 2 eee ee ee 
2.2 Defining Processes .............. 0002 epee eenae 
2.3 Playing with Numbers .................0 000084] 
2.4 How Processes Execute ........... 2.000 eee eens 
2.5 Basic Operations ...... 2... 0... ee 

2.0.1 Assignment .......... 0.000 eee eee ee ene 

2.5.2 Matching ........... 0.000. eee ee ee ee 

2.5.3 Guard Execution .............. 20002 ee eee 
2.6 Playing with Structures ...............0.000 000084 
2.7 A Programming Convenience ...............+.2008. 
2.8 A Strand Interpreter ..................0.2.000004 
2.9 The Operational Model ................0...2. 00084 
2.10 A Programming Example .............. 02000 eee 
2.11 Summary ...... aaa 


3 Six Basic Techniques 
3.1 Communication Protocols ................00 88888 4 
3.1.1 Producer-Consumers .................00084 


Incomplete Messages 
Incomplete Messages with Mergers 
3.1.4 Bounded Buffers 
Difference Lists 
Short-Circuits 
Task Sequencing 
Implementing Testing Predicates 


3.4.2 Blackboard Implementation 


Two Ways to Solve a Problem 
4.1 The Paving Problem 
4.2 Stepwise Refinement 
4.3 A Process-Oriented Solution 
Refining the Kruskal Process 
Managing Sets 
The Sets Process 


The Process-Oriented Program 
A Data Structure Solution 
The Sets Data Structure 
4.4.2 Program using Data Structures 
Improving the Data Structure Solution 
4.5 Concluding Remarks 


Programming Problems 
5.1 The Family Pastimes Problem 


The Booking Agency Problem 
The Basic Booking System 
5.2.2 Multiple Bookings 
5.2.3 Discussion 
The Speedy Pizza Problem 


5.3.2 The Ceiling 
5.3.3 Speedy’s Couriers 
5.3.4 Discussion 
The Noble Ancestors Problem 
Exhaustive Parallel Search 
Depth-First Search 
Depth-Bounded Search 
Heuristic Search 
Collecting Solutions 


CONTENTS 


CONTENTS 


0.0 
5.6 


5.4.6 Discussion. .......... 2.0.00 eee eee eee eee 
The Philosophical Programmers Problem .............. 
Summary . 2... ee a 


II Advanced Techniques 


6 Programming in the Large 


6.1 
6.2 
6.3 


6.4 


6.5 


6.6 
6.7 


Principles .. 1... ee 
Modularization Using Stepwise Refinement ............. 
Using Modular Decomposition .................0.04. 
6.3.1 Interface Specifications. ..............2.2.2004 
6.3.2 Evaluation of the Second Modularization .......... 
Linguistic Support for Modularization ................ 
6.4.1 The Editor Module....................00. 
Advanced Programming with Modules ................ 
6.5.1 Run-Time Resolution of Names................ 
6.5.2 Load-Time Resolution of Names ............... 
6.5.3 Compile-Time Resolution of Names ............. 
Code Mapping ............ 0.00 2 eee ee ee ee eee 
Summary ...........2..0004 Le ee 


7 Integrating Existing Code 


7.1 
7.2 
7.3 


7.4 


7.9 


8.1 


8.2 


8.3 
8.4 


Designing the Interface... ........ 2.000. eee ee ne 
Implementing the Interface .............. a 
Six Design Principles... ... 2... . ee 
7.3.1 Relaxing the Single-Assignment Rule (P6).......... 
7.3.2 Allowing Unaccomplished Effects (P4) ............ 
7.3.3 Storing Pointers (P5) ................-2-2-- 
A Database Management System ...............0.000. 
7.4.1 The Interface ...........0... 0.00002 ee eae 
7.4.2 A Database Monitor ..................00004 
7.4.3 A Database Manager................000004 
7.4.4 Discussion... .. aaaea aaa ee 
Summary .. 1... a 


Process Mapping 


Ring Mappings ...... 2... 2. ee 
8.1.1 List Intersection ................ 000000 G | 
8.1.2 Merge Sort ..... 2... 0... ee ee 
Torus Mappings. .... 1... 2... ee 
8.2.1 Matrix Multiply ............... 00.02.0000 
Comparing Regular Mappings.................05008. 
An Irregular Mapping ............0.0 2.000 eee eee 
8.4.1 Problem Mapping ................2.2..0004.% 


viii CONTENTS 
8.4.2 Manager Definition. .................2.206. 202 

8.4.3 Worker Definition .....................0.2. 205 

8.4.4 Constraining Concurrency .................6. 206 

8.4.5 Discussion. ............ 00222 eee eee eee 207 

8.5 A Multilingual Mapping ...............-.-2.2.20-+008. 210 
8.5.1 Mapping Grid Problems .................... 211 

8.5.2 The Interface ........... 200.00 eee ee ene 212 

8.5.3 The Strand Program ...............2.00 000 213 

8.5.4 Improving Memory Performance ............... 215 

8.5.5 Discussion... . aoao 216 

8.6 Summary ...... aaaea ee 216 

9 Metaprogramming 219 
9.1 Program Analysis... ........0 0.00 eee eee eee eee ns 219 
9.2 Interpreters .. 2... 2... ee 221 
9.2.1 A Strand Interpreter..................2004 222 

9.2.2 A Tracing Interpreter ................-..204. 225 

9.3 Source-to-Source Transformations. ...............+.. 227 
9.3.1 Generating the Rule Form................... 227 

9.3.2 Adding a Short-Circuit ................200. 229 

9.4 Summary ..... 2... 2.0... eee 231 
III Case Studies 233 
10 Reasoning About Equality 235 
10.1 Introduction. .............. 2.2.2.0 - 02 eee eee eee 235 
10.2 The Sequential Algorithm .....................-.-. 236 
10.3 Overview of the Approach .... aaaea 240 
10.4 Implementing the Graph Algorithms ................. 244 
10.4.1 The Find Algorithm ...................... 245 

10.4.2 Adding Equalities .................-.-.2-4. 248 

10.5 Synchronization Issues ............. 0002.22 eee eee 249 
10.6 Conclusions ............ 2.2.00 ee eee ee ee ee ee 250 
11 Aligning Genetic Sequences 253 
11.1 Introduction... . oaaae 253 
11.2 The Problem ................ 2.0.02. eee enee 204 
11.2.1 Why Create Alignments? .................-.. 255 

11.2.2 What Is a Correct Alignment? ................ 259 

11.3 Our Alignment Algorithm ... aaaea 256 
11.4 Strand as an Implementation Vehicle................. 258 
11.5 Developing the Bilingual Program .................. 259 
11.6 Using Multiprocessors . .... aoaaa 00202 ee eee 261 


11.6.1 The Strand Program ....................2.. 262 


CONTENTS 


11.7 


11.6.2 The Scheduler ..................0. 
11.6.3 The Transformation ................ 
11.6.4 Performance Studies ................ 
Summary ...... 0... 00 eee eee ee 


12 Discrete Event Simulation 


12.1 
12.2 
12.3 


12.4 
12.5 
12.6 


Introduction. ............. 0.08. eee eee 


Representation Issues ................204 
12.3.1 Logical Process Representation .......... 
12.3.2 Domain Isolation .................. 
12.3.3 Detecting Completion ............... 
12.3.4 Spawning the Network ............... 
A Sequential Event Queue Solution. ........... 
A Concurrent Time Warp Solution ............ 
Discussion... . 2... . 2 ee 


13 Programming Telephony 


13.1 
13.2 


13.3 
13.4 
13.5 
13.6 
13.7 


History .. 1... a 
Introduction To Erlang. .................. 
13.2.1 Motivation for Erlang ............... 
13.2.2 The Erlang Language ............... 
Compilation of Erlang to Strand ............. 
System Architecture .............202.2.0 004 
Performance. ..........00 0+ eee eee eenas 
The Experimental System ................. 
Discussion... . oaa ee 


Bibliographic Notes 


A Predefined Tests, Processes and Operators 


A.l 
A.2 
A.3 


Guard Tests .............2. 0000. eee ene 
Predefined Processes ................eee804 
Predefined Operators. ...........2.000088. 


Acknowledgments 


We would like to extend our sincere thanks to those who contributed case studies 
to this book. Carl Kesselman at The Aerospace Corporation provided Chapter 
10: Reasoning About Equality. At Argonne National Laboratory, the members of 
the computational biology group: Ralph Butler, Tracye Butler, Nicholas Karonis, 
Robert Olson, Ross Overbeek, Nathan Pfluger, Morgan Price and Steve Tuecke 
contributed Chapter 11: Aligning Genetic Sequences. Martin Gittins of Strand 
Software Technologies Inc. provided Chapter 12: Discrete Event Simulation. At 
Ellemtel, Joe Armstrong and Robert Virding contributed Chapter 13: Program- 
ming Telephony. 

The book design and layout reflects the caring attention of Anna Taylor. We 
thank Don Martinetti for investing so much thought into the illustrations through- 
out the book. Nan Boden, Dian De Sha and Will Winsborough made substantial 
contributions to the text by reading and thoroughly checking the manuscript. 
Nan’s careful attention to detail while testing the programs removed countless 
embarrassing errors. Ross Overbeek used a draft of the book to teach a class in 
parallel processing at the University of Texas at Austin, providing valuable feed- 
back. The students of CS 284b at Caltech provided numerous suggestions on both 
this book and the Strand system. 

That this book is supported by a commercially available programming system 
owes much to the pioneering spirit of the directors of Strand Software Technologies 
Inc. In particular, David Butler, Martin Gittins and David Catton have provided 
us with constant support and encouragement. Jaqui Dowsett contributed valuable 
expert guidance in ensuring that the book is consistent with the Strand product 
line. The quality of the current programming tools is a direct reflection of the 
hard work and enthusiasm of the development team; in particular, Richard Barnes, 
Andrew Dinn, Will Pickles and Andy Sizer. 

Finally, we would like to express our thanks to Argonne National Laboratory, 
Imperial College and California Institute of Technology for the support we have 
received while preparing this manuscript. 


Strand” 


New Concepts in Parallel Programming 


g) 
O TIN 
tit 
a 
I | ss 
(A 3 
| À d AN 
ONAN 
Pa ` 





Chapter 1 


Introducing Strand 


“The programmer, like the poet, works only slightly removed from pure 
thought-stuff. He builds his castles in the air, from air, creating by ex- 
ertion of the imagination. Few media of creation are so flexible, so easy 
to polish and rework, so readily capable of realizing grand conceptual 
structures. ” 


F.P. Brooks, Jr. 
The Mythical Man-Month 


A new dimension in programming has arrived, one that will test our creative abili- 
ties and push programmers to the limits of their understanding. Parallel computers 
are now readily available and cost-effective for a number of programming prob- 
lems. As we push toward the next century the dominant issues will concern how to 
utilize these powerful new machines. We can expect them to contain increasingly 
larger numbers of computers, to increase in performance and to decrease in phys- 
ical size. This frontier will require new programming techniques, new algorithms 
and new methodologies. Now is the time to experiment with alternative ways to 
express ourselves; now is the time to consider the organizational principles needed 
to sustain the development of large programs that will execute on thousands of 
computers. In response to this demand for creative thinking, this text presents a 
new Strand of ideas — a thread we hope you will pick up and pursue. 


1.1 An Analogy 


One of the most important aspects of human behavior is teamwork. Many major 
endeavors rely not on a single individual but rather on a collection of individuals 
who interact and cooperate to achieve a common goal. Consider large construction 


3 


4 Chapter 1. Introducing Strand 


projects such as building a housing complex or constructing a highway. Rarely are 
these tasks completed by a single person working in isolation. Some projects are so 
vast that they could never be completed in a single lifetime without team effort. 
To master the complexity of a large undertaking, it is invariably necessary to 
decompose the task into constituent parts that can be approached independently. 
It should not be surprising that an analogous situation has become evident in 
computer science. As our ability to manufacture large numbers of computers 
steadily grows, it is natural that computers should be required to collectively solve 
problems. To achieve this organization, parallel programs must be structured as 
a collection of interacting components. Just as the complexity of a construction 
project is mastered using decomposition, it is natural to expect this strategy to 
play an important part in parallel program design. Let us take the analogy further 
and examine the way in which large construction problems are solved. In doing so, 
we aim to point out some problematic aspects of designing large parallel programs 
and to introduce some useful ideas. 


1.1.1 Problem Solving 


Problem decomposition is in essence a task of recognizing abstractions. For exam- 
ple, in building a house a project manager must schedule the laying of foundations 
but need not be concerned with how to mix concrete. The laying of foundations is 
an abstraction consisting only of the information that is important to the manager, 
namely, the task completion date. By dealing only with an abstraction, the man- 
ager is able to disregard irrelevant details. The question is, how are abstractions 
formed? 

One method is to repeatedly divide a task into successively smaller and more 
manageable subtasks. For example, the task of building a house might be divided 
into subtasks such as raising walls, adding windows, making electrical connections 
and constructing a roof. Eventually, this decomposition strategy results in elemen- 
tary tasks, such as hanging a door or connecting an electrical outlet, that can be 
completed by a single individual. Of course, errors sometimes occur; for example, 
a lighting fixture is connected and subsequently the wiring is found to be faulty. 
This requires some backtracking to redo parts of the task; in this example the 
fixture is removed, the wiring replaced and the fixture reconnected. 

This method of forming abstractions is analogous to a computer science design 
methodology termed stepwise refinement. The method involves the gradual 
refinement of a program beginning with a specification of the initial goal. The goal 
is repeatedly divided into smaller, more manageable, constituent parts. At each 
stage detail is added to the specification until eventually the statements written can 
be coded directly. As we would expect, errors sometimes occur and backtracking 
in the design process is necessary to correct the specification. Stepwise refinement 
develops both the design of a program and its code hand in hand. 


1.1. An Analogy 5 





Unfortunately, this form of decomposition is insufficient for large tasks since it 
does not encourage the recognition of commonalities between subtasks. Consider 
the task that confronts the developer of a large housing complex. It is obvious 
that many of the houses may use what are fundamentally the same building plans, 
building techniques and materials. Isolating these commonalities can save cost and 
provide improved reliability; having been used once, a design can be reused time 
after time. The analogy holds in computer science: Stepwise refinement is not 
sufficient as a program design methodology because recognition of commonality 
is necessary in order to save effort, increase reliability and provide more powerful 
abstractions. Stepwise refinement is also deficient in another aspect: It does not 
encourage the recognition of changeable properties in a design. In the life cycle of 
a large software product, it is inevitable that software will be adapted to changing 
requirements. 

For large programs a design methodology called modular decomposition is 
preferred. This methodology stresses a particular approach when decomposing a 
task into subtasks. The basic idea is to begin system design by listing components 
that are common to various parts of the system or are likely to change. The 
design is then composed of a collection of functional units, called modules, each of 
which encapsulates and hides the implementation of one component. Each module 
includes an interface that designates the functionality that the module provides 
to the rest of the system. System design is complete when all interfaces have been 
specified. At this point individual modules may be developed independently. 


6 Chapter 1. Introducing Strand 


oe 


x. 





ete |n 
VALE 


A final aspect of problem solving, often overlooked, is that it has some histor- 
ical qualities. If every time a house were built, it was necessary to re-invent the 
wheelbarrow, few homes would ever reach completion. Progress relies on making 
use of tried-and-tested ideas, sometimes making incremental improvements and 
sometimes using radical innovations. An analogous situation occurs in the devel- 
opment of large parallel programs. Tremendous efforts have been made to develop 
code segments for uniprocessors in languages such as C, Fortran and Cobol. These 
efforts have been undertaken at considerable cost and can provide aid when devel- 
oping future programs. Programming parallel machines will require radical new 
algorithms and techniques; however, it is often expedient to be able to employ 
existing sequential algorithms where they apply. 


1.1.2 Further Observations 


Returning to the construction project analogy, an interesting feature of large tasks 
is that elementary subtasks are invariably only partially ordered. The partial order 
is important as it allows various aspects of the project to proceed together, thus 
reducing the time taken to complete the project. For example, the walls of a 
house must be built before the roof but the windows, electrical connections and 
plumbing can often be installed more or less at the same time. In addition, the 
partial order permits a degree of flexibility for alternative schedules. For example, 
if materials do not arrive on time, other tasks may proceed until the required 
materials become available. Here again, the analogy in programming is clear: 


1.2. Tying Concepts Together ? 


Partial orderings provide the opportunity to execute program segments in parallel 
to improve performance; they also allow flexible scheduling strategies. It is thus 
desirable that the design process be conducted in a manner that encourages a 
partially ordered specification. 

The partial order of events in a construction project is often outlined in the 
early stages by a project plan. The plan sets deadlines that serve to sequence 
events. For example, the wiring must be laid out by a particular date so that 
lighting fixtures can be added. At each deadline either a task must be complete or 
resources, such as window frames, must be available. Implicit in a project plan is 
the understanding that various types of information are to be exchanged between 
those who carry out the plan. For example, construction managers hold meetings 
at which they discuss progress; team leaders organize work parties by providing 
guidance and information to workers. Through this complex chain of interactions, 
projects progress by the cooperation of those involved. It is thus clear that the 
organizational problems in a large project are intimately concerned with commu- 
nication between individuals and with the synchronization of events. By analogy, 
it is natural to expect synchronization and communication to be problematic in 
the design of large parallel programs. Once again, it is desirable that the design 
process be conducted in a manner that encourages a natural formulation that 
includes these concepts. 


1.1.3 Summary 


Parallel programs are composed of a collection of independent entities that work 
as a team to solve a problem or perform a task. Abstraction and encapsulation are 
the programmer’s most powerful tools for solving problems. Stepwise refinement 
and modular decomposition are the most useful methods by which these tools can 
be employed. A programming notation in which these methods are used should 
encourage a partially ordered specification and make it possible to reuse existing 
sequential code. It should also provide simple and effective methods for organizing 
communication and synchronization. 


1.2 Tying Concepts Together 


This book introduces Strand, the first commercially available concurrent logic 
programming language. Over the last few years researchers working in logic pro- 
gramming have developed a variety of new parallel programming techniques and 
ideas. Unfortunately, these concepts have been presented only in a fragmented 
form through the research literature; this has made them inaccessible to students. 
This book seeks to redress the situation by giving a coherent presentation of the 
techniques and illustrating the design processes in which they are employed. 


8 Chapter 1. Introducing Strand 













Ser 
KGS =, A f 
C 
i! 


| p> / | 
oe 


wl 









1.2.1 Design Methodology 


Strand encourages the use of two distinct program design methodologies: Modu- 
lar decomposition and stepwise refinement. The former is used for programming 
in the large, whereas the latter is used primarily for programming in the small. 
Large problems are initially decomposed into modules with carefully defined in- 
terfaces. Modules are then developed independently by incrementally refining the 
interface specification using stepwise refinement. Although other chapters will 
provide detailed descriptions of program design, we seek to provide a flavor of the 
programming style here. 

Consider the problem of simulating the building of a housing project. The 
design task would begin by listing components of the problem that are likely 
to change or are common to various parts of the system. Example components 
might include the algorithm used to divide a construction site into lots or the 
method used to construct an entity (house, factory, office building) on a given 
lot. A collection of modules would then be designed to encapsulate these design 
decisions. Each module includes an interface definition that lists the functions 
it provides, or exports, to other modules in the system. The implementation 
of these functions is contained within the module along with any necessary data 
structures and associated operations. Should the system requirements change 
over time, only a subset of the modules will be affected. In addition, modules are 
sufficiently self-contained that they can be reused for other applications. 

One common activity in our analogy is the building of a single house. A 
natural formulation of the problem might include a module that encapsulates 
this activity. The interface to the module might include functions for building a 
house, returning attributes of a house, numbering a house, etc. The module design 


1.2. Tying Concepts Together 9 


and implementation would begin with an outline specification for each exported 
function, for example: 


build_house :— 
lay_foundations, 
raise_walls, 
add_windows, 
make-electrical_connections, 
construct_roof. 


This formulation is actually a Strand program; the symbol ‘:—’ separates a prob- 
lem from its decomposition into subproblems. Note that the program naturally 
expresses the problem to be solved and provides a high level of abstraction. This is 
due to the use of symbols to express concepts; there is a close relationship between 
the statement and its English equivalent: 


“To build a house we must lay the foundations, raise the walls, ...and 
construct a roof.” 


An interesting aspect of the outline specification is that it is only partially or- 
dered. Each of the subproblems in the above formalization may be considered the 
responsibility of an independent entity that we shall refer to as a process. The 
specification states that five processes must work together as a team to build a 
house. There is no notion of ordering here; the processes can be executed in any 
order or in parallel. Thus, an equivalent formulation of the same problem might 
be: 


build_-house :— 
construct_roof, 
raise_walls, 
lay foundations, 
make-electrical_connections, 
add_windows. 


A module is usually a relatively small program and thus can be constructed in- 
crementally with stepwise refinement. The above program would be refined by 
adding detail and specifying each process in turn. For example: 


lay_foundations :— 
dig-hole, 
mix_.sand_and_cement, 
fill_hole. 


At this point we might expect some synchronization between the actions to be 
necessary. For example, before the cement is mixed the hole should be dug. This 
synchronization can be introduced by a further refinement that adds a commu- 
nication channel between the corresponding processes. For example: 


10 Chapter 1. Introducing Strand 


lay.foundations :— 
dig_-hole(Done), 
mix_sand_and_cement(Done), 
fill_hole. 


This channel can be used by the dig-_hole process to indicate that digging is com- 
plete. The mixing process would wait to receive a message from the digging process 
before performing its designated task. A similar refinement can be used to ensure 
that the concrete is mixed before the hole is filled. 


lay_foundations :— 
dig-hole(Done), 
mix_sand_and_cement(Done,Done’1), 
fillthole(Done1). 


Continued refinement eventually produces a complete Strand program that can be 
executed to simulate house building. A detailed description of how these programs 
operate is provided in Chapter 2; for the present, it is sufficient to point out that 
the refinement procedure naturally leads the programmer to the point of specifying 
communication and synchronization. 

One further refinement step is necessary to complete the program design. In 
general, current Strand implementations, while capable of running processes on 
multiple computers, do not do so unless given explicit instructions by the pro- 
grammer. The Strand system provides a variety of simple tools to help with this 
task. If we suppose that the above program were to be executed on a network 
composed of just two computers, the following specification might be written: 


lay_foundations :— 
dig-hole(Done)@other, 
mix_sand_and_cement(Done,Done1), 
fill_hole(Done1)@other. 


Assume that the lay_foundations process executes at one computer; this program 
causes the dig_hole and fill-hole processes to be executed on the other computer 
while the mix_sand_and_cement process executes on the current computer. 

Although there are many possible ways to connect computers, there are gen- 
erally only a small number that are interesting from a programming perspective. 
The Strand system provides a collection of virtual machines that are abstrac- 
tions of the physical hardware. Each virtual machine is a uniform structure (e.g., 
mesh, ring, tree, linear array) that can be programmed more easily than an arbi- 
trary network of computers. Simple annotations (similar to the other annotation 
shown above) are used for mapping processes over these virtual machines. This 
methodology reduces the programming task to one of mapping a problem to a 
convenient virtual machine using some simple tools. The virtual machine can be 
mapped to a variety of architectures with an appropriate system library; thus, 
applications can achieve a level of hardware independence. 


1.3. An Overview of the Book 11 


1.2.2 Performance Enhancement and Extensibility 


As a high-level programming notation Strand provides the ability to rapidly proto- 
type parallel programs; however, the primary motivation in parallel programming 
is speed. Strand is extensible in that the programmer may define and use new data 
types with associated operations. These definitions may be written in lower-level 
languages affording performance enhancement. In this way a programmer may 
isolate performance bottlenecks and incrementally improve performance. 

The extensibility property provides a simple mechanism for reusing existing 
code segments in C, Fortran and other languages. This is achieved by design- 
ing processes that encapsulate the pre-existing code. A problem solution is then 
constructed in terms of these processes. Another valuable application of the ex- 
tensibility property is to define processes that utilize the vectorizing hardware 
provided by many parallel computer manufacturers. 


1.2.3 Concluding Remarks 


Strand is not a solution to all the world’s parallel programming problems. It is 
a simple notation that represents an evolutionary step in the design of parallel 
programming languages; one that is readily available and has grown out of the 
authors’ collective research. 

We do not claim that the language is necessarily appropriate to all problems 
and computing engines but we have been pleasantly surprised by its utility to 
date. It has been used on networks of Sun computers, Transputer surfaces, hyper- 
cube architectures, shared memory machines and mesh-connected architectures. 
A variety of non-trivial application domains have been investigated. These include 
telephony, discrete-event simulation, DNA sequencing, automated reasoning, com- 
pilers, distributed databases and programming environments: A number of these 
are described in case studies at the end of this book. 

In providing this practitioner’s text, it is our hope to encourage users to get 
hands-on experience in programming parallel machines. Parallel programming 
is a relatively young discipline; only by diving deeply into the murky waters of 
practical experimentation can we hope to surface in the clearer waters of new ideas 
with a firm appreciation for the fundamental problems. 


1.3 An Overview of the Book 


The book is divided into three distinct parts. The first is devoted to basic con- 
cepts and consists of the first five chapters. The second part describes advanced 
techniques and comprises chapters six through nine. The final part, consisting of 
chapters ten through thirteen, is a collection of case studies. 

Basic Concepts. Chapter Two is intended for readers who have not encoun- 
tered concurrent logic programming before and provides the necessary preliminar- 
ies. It includes a description of Strand features used to describe data, processes, 
communication and synchronization. 


12 Chapter 1. Introducing Strand 


Chapter Three presents six basic programming techniques. These techniques 
provide the basic building blocks from which all parallel applications are con- 
structed. Some of the techniques provide communication protocols; others pro- 
vide mechanisms for organizing data structures or sequencing complex tasks. In 
essence, Strand programming revolves around repeated use of these few simple 
ideas. 

Chapter Four introduces two fundamental approaches to solving problems and 
illustrates program design by stepwise refinement. Two solutions of a single prob- 
lem are developed. The first emphasizes the definition of process structures; the 
other is based on the definition of data structures and associated operations. 

Chapter Five shows how a number of standard problems are solved in Strand 
using techniques introduced in previous chapters. 

Advanced Techniques. Chapter Six is concerned with programming in the 
large and describes the basic methodology, programming techniques and linguistic 
support. 

Chapter Seven shows how to extend Strand with user-defined data types and 
associated operations. It includes a description of how existing code segments 
written in languages such as C and Fortran can be integrated into Strand programs. 

Chapter Eight illustrates parallel programming techniques for mapping pro- 
cesses over virtual machines. 

Chapter Nine explains how to implement program transformations and inter- 
preters. These ideas are useful for adding language features to Strand. 

Case Studies. The final four chapters of the book are devoted to a collection 
of case studies that have been contributed by a variety of industrial and academic 
sources. Each case study describes an application and outlines its solution in 
Strand. The studies describe the problem domain, specify the particular problem 
solved and describe the issues involved. These are intended to illustrate useful pro- 
gramming ideas or organizational concepts that were encountered during program 
design and implementation. 


Part I 


Basic Concepts 





Chapter 2 


Strand Programming 


This chapter introduces the programming notation used to express Strand pro- 
grams and gives an operational description of how programs execute. The de- 
scription here is informal and provides the necessary information to understand 
the remainder of the book. 

How To Read This Chapter. The chapter has been designed with two 
different types of reader in mind. Those who are familiar with concepts such as 
data structures, processes and synchronization can read it quickly, focusing on 
only the major differences between Strand and other languages. Those who are 
not acquainted with these basic concepts should use the text in a more tutorial 
manner. To develop an appreciation for the subtleties involved, it is important to 
complete all the exercises. 


2.1 Denoting Data 


Strand programs manipulate data structures called terms. Each term is either 
a number, string, variable or structure. Let us examine how these different data 
items can be denoted within the text of a program. 

Numbers. A number is denoted by itself and is written directly in a program. 
For example: 


1, 3.7, —4, 47.6, —23.5 


Two of these numbers, 1 and —4, are integers; the others are all real numbers. 
Strings. A string denotes a symbolic entity in a program. Each string is a 
sequence of characters enclosed in single or double quotes. For example: 


‘January’ 
‘Plot number 7’ 
“Fred and Wilma” 
“This is a string, it includes a comma and a period.” 


15 


16 Chapter 2. Strand Programming 


If a string cannot be confused with other language terms, the enclosing quotes 
may be omitted. For example: 


manager 
worker1234 

big-house 
small_person 


Variables. A variable represents a value or entity that is to be computed. 
Each variable is denoted by a sequence of alphabetic characters and/or numbers 
beginning with an uppercase letter or the character “.”. For example: 


January 
-temp 
House57 
This-variable 


Notice that the variable January is distinguishable from the string ‘January’ because 
the latter is enclosed in quotes. The last variable is the anonymous variable; this 
is used to denote an unnamed or unimportant value. 

Structures. A structure represents a collection of data items and may in- 
clude other structures. There are two basic organizational schemes for structures: 
tuples and lists. The organization chosen will depend on the manner in which 
the elements of the structure are to be accessed. If there are a fixed number of 
elements that are accessed randomly, then the best representation is a tuple. If 
there are a varying number of elements that are accessed sequentially, then the 
best representation is a list. 


Y & 


RE 
| 


| J 
way 


w Uwe 
» SOV ao 












2.1. Denoting Data 17 


Tuples. A Tuple is denoted by a collection of data items enclosed in braces, 
i.e., { }. The size, or arity, of a tuple is the number of arguments it contains. For 
example, the following tuple may represent a building: 


{house,77,{bedrooms,3},Owner} 


In this case the arity is 4 and the arguments are: the string house, the number 
77, a tuple of arity two and the variable Owner. 

If the first argument of a tuple is a string, then the tuple can be written in a 
more concise prefiz notation. In this representation the string names the structure 
represented. The previous example can be expressed in a variety of ways with this 
notation: 


house(77,bedrooms(3), Owner) 
house(77,{bedrooms,3},Owner) 
{ house, 77,bedrooms(3),Owner} 


All of these forms denote the same data structure which is the following n-ary tree: 


house 77 Owner 


am 
“= 
—— 


bedrooms 3 


Lists. A list is used to represent a sequence of data items. Each list element 
is denoted by [Head|Tail], where both Head and Tail are terms; an empty list is 
denoted by []. The head of a list represents the first element in a sequence; the 
tail represents the remainder of the sequence. For example: 


[house|[77|[{bedrooms,3}|[Owner|[]]]]] 


denotes a list which contains the same data items as the example tuple shown 
earlier. This way of writing lists is cumbersome so a more concise notation is 
preferred: 


[house,77,{bedrooms,3},Owner] 


The structure denoted by the list notation is completely different from that of the 
tuple shown earlier. The tuple represents an n-ary tree while the list represents 
the following binary tree: 


18 Chapter 2. Strand Programming 


bedrooms 3 Owner [] 


Compare the data structures for the tuple and the list. In the tuple representa- 
tion it is necessary to count through the arguments to access the data items in 
sequence. In the list representation, sequential access is achieved by following the 
right branch of the tree. 

It is often valuable to be able to denote the continuation of a sequence without 
mentioning its elements explicitly. For example: 


(house, 77|Rest] 


This list represents the following structure: 


Four types of data: 


e numbers 
e strings 

e variables 
e structures 





2.2. Defining Processes 19 
Exercises 


2.1 What types of data structure are the terms: 


(a) -57 (b) [] (c) pi (d) {X} 
(e) ‘Hello World’ (f) Hello-World (g) 22222222 (h) worker(‘Jim’) 


2.2 Draw the data structures corresponding to the terms: 
a) [1,2.[].{X}.4.john] 

b) {1,[22.3|Y],3,4,fred} 

c) {worker,“John Smith”, {age,25},height(‘5 feet’) } 

d) [[[[a,.b|Rest1],23]|Rest2],{{[[[],Rest3],23]|[]]}] 


2.3 In Problem 2.2.d, what is the difference between Rest2 and Rest3? 


2.4 What properties distinguish a list from a tuple? 


2.2 Defining Processes 





Computation in Strand is performed by a set of interacting processes. Each process 
can be represented by a term of the form: 


p(Ti,T2,... Tn) (n>0) 


where p/n identifies the program used to execute the process and Tj, ...,Tn are the 
process arguments. The arguments are data structures (terms) that comprise the 


20 Chapter 2. Strand Programming 


process state. A Strand program describes the actions that processes may perform. 
There are just three types of action: terminate, change state and fork. 


Process Actions: 


e terminate 
e change state 
e fork 





To illustrate process actions consider a process which simulates the behavior of 
a Hollywood architect. This architect only builds two types of house: tasteless 
and ostentatious. Tasteless houses are expensive. Ostentatious houses are very 
expensive because the architect must employ artists to aid in the design. The 
architect is most selective and will have nothing to do with other types of house. 


Bik 








We choose to represent the architect by a process of the form: 
architect(House Type, Design) 


Figure 2.1 shows some possible process actions involving the architect. In the first 
case, the architect is asked to design a tasteful house and ignores the request; 
the corresponding process terminates and nothing further happens. In the second 


2.2. Defining Processes 21 


terminate 
architect(tasteful, House) —— Nothing to do 
. change state 
architect(tasteless, House) ———_ designer(House) 
change state 
architect(ostentatious, House) designer(Design) 
+ fork 
artist(Design, Concept) 


landscaper(Concept,House) 


Figure 2.1: Architect Process Actions 


case, the architect is asked to design a tasteless house and begins work immedi- 
ately. The corresponding process changes state to a designer and subsequently 
proceeds to design the house. In the last case, the architect is asked to design an 
ostentatious house and must employ creative assistance. Here the corresponding 
process both changes state to a designer and spawns two new processes: an artist 
and a landscaper. 

Notice that the designer and artist share the variable Design. This serves as a 
communication channel via which the designer communicates a preliminary design 
to the artist. Similarly, the variable Concept is used to communicate the artist’s 
creation to the landscaper who produces the final house design (House). 

Describing Process Actions. Strand provides a concise notation for de- 
scribing process actions. Programs are composed of a set of rules, each of which 
describes a single action. Rules have the following general form: 


H :— G), Gg,..., Gm | Bi, Ba,..., Bn. mn>0Q 
where H is the rule head, ‘ :—’ is the implies operator, the Gs comprise the 
rule guard, ‘|’ is the commit operator and the Bs are the rule body. The 
period signifies the end of a single rule. The rule head has the same form as a 
process. Initially, we restrict attention to rules in which the head contains only a 
single occurrence of any variable; this restriction will be relaxed later. The rule 
guard contains a sequence of predefined test operations. The rule body contains 
a collection of predefined (i.e., built-in) and/or programmer-defined processes. If 
the rule guard is empty, the commit operator may be omitted. If the body is 
empty but the guard is not, the rule is written: 


H :— Gj, Gg,..., Gm | true. 
If both the guard and body are empty, the rule is written: 


H. 


22 Chapter 2. Strand Programming 


Rules with the same name and number of arguments are grouped to form a pro- 
cess definition. 


Process Definition: 


Set of rules with same name 
and number of arguments. 





Rules define the three basic process actions in the following manner. The head 
and guard collectively specify a set of preconditions under which a process may 
execute. A rule whose body is empty specifies that if the preconditions are satisfied 
then the process terminates. 


A rule of the form: 
H :— G), G2,..., Gm | B}. 


specifies that the process changes state to the process B, if the preconditions are 
satisfied. 





2.2. Defining Processes 23 
A general rule of the form: 

H :—G), Go,..., Gm | Bi, Be,..., Bn. 
specifies that, if the preconditions are satisfied, then the process changes state 


to one of the processes Bı, Bo,..., Bn and simultaneously spawns, or forks, the 
remaining processes. 





Figure 2.2 illustrates the use of rules to specify process actions by showing a 
process definition that implements the Hollywood architect. 


Termination: architect(Type,_) :— 
Type =\= tasteless, Type =\= ostentatious | true . 


Change State: architect(tasteless,House) :— designer(House). 
Change State: architect(ostentatious,House) :— 
+ Fork designer(Design), 


artist(Design, Concept), 
landscaper(Concept,House). 


Figure 2.2: An Architect Process Definition 


24 Chapter 2. Strand Programming 


Let us consider in detail how these rules specify the architect’s actions. Each 
rule can be decomposed into preconditions and actions. For example, the first rule 
reads: 


“If the first argument is neither tasteless nor ostentatious, then termi- 
nate.” 


Decomposing the rule we arrive at the following preconditions and action: 


Preconditions: 
1. The process must be named architect and have two arguments. 
2. The first argument must not be the string tasteless. 


3. The first argument must not be the string ostentatious. 


Action: Terminate. 


Preconditions 2 and 3 are expressed using a guard test =\= which checks for 
inequality. A variety of useful predefined tests are provided by the Strand system 
and are described in later sections. The last rule reads: 


“If the first argument is ostentatious, then change state to either a 
designer, an artist or a landscaper process and spawn the remaining 
processes.” 


This rule is decomposed as follows: 


Preconditions: 
1. The process must be named architect and have two arguments. 
2. The first argument must be the string ostentatious. 
Actions: 
1. Change state to either the designer, artist or landscaper process. 
2. Fork the remaining two processes. 
The second precondition in this rule is specified by the string ostentatious in the 
rule head. This is an example of a fundamental Strand mechanism termed match- 


ing. The rule is used only if the first argument of the process matches the string 
ostentatious. Matching will be explored in detail in Section 2.5.2. 


2.3. Playing with Numbers 25 


2.3 Playing with Numbers 


= 


a6 3 o 
V j 


Previous sections have described the notation used to define Strand processes. 
We will now use this notation to present solutions to some simple programming 
problems. In doing so, we briefly introduce some fundamental concepts that will 
be developed in subsequent sections. 

Maximum. Given two numbers X and Y, return the larger number as Z. 


max(X,Y,Z) :— 
X>Y |Z:=X. 
max(X,Y,Z) :— 
X=<Y |Z:=Y. 


This program introduces the predefined assignment process (:=). This process is 
used to generate a value for a variable. For example, execution of the following 
process using the max definition results in the variable Result being assigned the 
value 31. 


max(27,31,Result) 


Consider the two rules in turn. In the first rule, the important precondition is that 
the value of the first argument X is greater than that of the second argument Y; 
this is expressed using a predefined test ‘>’. If this test is satisfied, then the max 
process changes state to a predefined assignment process. This process assigns 
the value X to the output argument Z. The second rule is similar; in this case the 
precondition requires that X be less than or equal to Y and the assignment process 
assigns the value Y to Z. 


26 Chapter 2. Strand Programming 


Notice that the two rules are mutually exclusive: For any given situation there 
is only a single rule that will be appropriate. An alternative formulation of the 
problem may use the rules: 


max(X,Y,Z) :— 

X >=Y |Z:=X. 
max(X,Y,Z) :— 

X =< Y |Z:=Y. 


Here the rules are not mutually exclusive; either rule may be used in the event 
that X and Y have the same value. Strand does not specify which rule will be used 
when more than one applies. 

Area. Given an object type T and a dimension D, return the area of the object 
as V. If T is a circle, D is the radius. If T is a square, D is the length of one side. 
If T is neither a circle nor a square, then return zero. 


area(square,D,V) :— 
V is Dx D. 
area(circle,D,V) :— 
V is 3.142 x D x D. 
area(T,D,V) :— 
T =\= square, T =\= circle |V := 0. 


This example illustrates the use of a predefined process is, which is used to perform 
arithmetic. This process has the form: 


X is Expression 


where Expression is an arithmetic expression which can include the usual operators 
(e.g. +,—,*,/). A complete list of the operators available in Strand is given in 
Appendix A. 

Largest Area. Given two objects O1 and O2 which have associated dimen- 
sions D1 and D2, return the area A of the larger object. Object types and dimen- 
sions are as specified in the previous example. 


largest(O1,D1,02,D2,A) :— 
area(O1,D1,A1), 
area(O2,D2,A2), 
max(A1,A2,A). 


This process definition consists of a single rule that has no preconditions. A largest 
process can hence immediately change state to an area process that computes the 
area A1 of the first object. It simultaneously spawns two new processes. The first 
computes the area A2 of the second object. The other computes the larger area A 
of areas A1 and A2. 

Execution of the following process, using the process definitions given in this 
section, computes a Result of 25. 


2.4. How Processes Execute 2? 
largest(circle,2,square,5,Result) 


This initial process is executed using the rule for largest, yielding three processes. 
These processes then execute using the area and max programs. Thus, execution 
of the initial process leads to a number of process actions. 


Exercises 
2.5 Separate the preconditions and actions in the area process. 


2.6 Define a process legs(Bug,Legs) that, for a variety of bugs, returns the number 
of legs on the bug. 


2.7 Define a process that calculates the total number of legs on three bugs. 


2.8 Define a process power that takes a number N and an integer J and returns 
N!. 


2.4 How Processes Execute 





A Strand computation corresponds to a pool of concurrently executing processes. 
Recall that each process is represented by a term of the form: 


p(T, Ta,. se Tn) (n>0) 


28 Chapter 2. Strand Programming 


where p/n identifies a process definition and the terms T1,T2,...,Tn are the process 
data state. Computation proceeds by repeatedly selecting a process and removing 
it from the pool. If the process is a predefined process (such as X:=Y) it is executed 
immediately; otherwise a reduction attempt is made. This involves selecting a rule 
from the program, matching the process to the rule head and executing the rule 
guard. If the preconditions specified by the head and guard are satisfied, then the 
process commits. This causes new copies of the processes defined in the rule body 
to be added to the process pool. 

To illustrate this computational model, we follow the execution of the following 
process using the programs presented in Section 2.3. 


largest(circle,2,square,5,R) 


Initially, the process pool contains only this single process. Recall that the process 
definition for largest consists of a single rule: 


largest(O1,D1,02,D2,A) :— 
area(O1,D1,A1), 
area(O2,D2,A2), 
max(A1,A2,A). 


Since this rule has no preconditions it can be used immediately to reduce the 
process. This results in a new process pool containing three processes: 


area(circle,2,A1) 
area(square,5,A2) 
max(A1,A2,R) 


Notice how data is propagated to the new processes according to the placement 
of variables in the rule. Any of these processes may now be chosen for reduction; 
indeed, if several computers are available, all three processes may be executed 
at the same time. Let us assume that the first process is now removed from 
the process pool. This can be reduced using the second rule of the area process 
definition: 


area(circle,D,V) :— 
V is 3.142*D«D. 


This gives a new process pool which contains an instance of the predefined process 
is: 


A1 is 3.142*2*2 
area(square,5,A2) 
max(A1,A2,R) 


2.5. Basic Operations 29 


Table 2.1: Executing the largest process. 


Step Pick Result Process Pool 
0 — — largest(circle,2,square,5,R) 
1 1 change state area(circle,2,A1), area(square,5,A2), 


+ fork max(A1,A2,R) 


2 1 change state A1 is 3.142 x 2 x 2, 
area(square,5,A2), max(A1,A2,R) 


3 2 change state A1 is 3.142 * 2 * 2, 
A2 is 5 * 5, max(A1,A2,R) 


4 2 terminate A1 is 3.142 * 2 x 2, 
max(A1,25,R) 


ö 1 terminate max(12.568,25,R) 
6 1 change state R := 25 
7 1 terminate <empty> with R = 25 


A series of further process reductions follow. Table 2.1 shows one possible sequence 
of reductions. It has four columns; the first simply numbers the computation 
steps. The second designates the process picked from the process pool of the 
previous step. The third column shows the result of the step: fork, change state 
or terminate. The final column shows the contents of the process pool following 
the computation step. 


Notice that the values of A1 and A2 computed in steps 4 and 5 become available 
to the max process. This occurs because the processes created in step 1 share 
variables. This illustrates process communication: The values of A1 and A2 
are communicated from the area processes to the max process. 


2.5 Basic Operations 


Previous sections have introduced the fundamental concepts of assignment, match- 
ing and guard ezecution. We now consider these basic operations in more detail. 


30 Chapter 2. Strand Programming 


2.5.1 Assignment 





Assignment is used to generate a value for a variable. Recall that an assignment 
process is denoted in a program text as follows: 


Variable := Value 
One way to visualize this operation is to think of a variable as a bor. The box has 
a label which corresponds to its name and can hold any data structure. The effect 
of executing an assignment is to place a value inside a box; any part of a program 


which refers to the variable via its label can then obtain the value. For example, 
consider the following assignment process: 


Steve := {33} 


Before the assignment is executed the variable corresponds to an empty box: 


/__/ 


After the assignment is executed the value {33} has been placed in the box: 


/__7 


{ 33 } 


Although this operation is simple, it has some subtle aspects that should be ap- 
preciated. Strand boxes are lined with glue: once a value is placed inside a box it 


2.5. Basic Operations 31 


cannot be removed, and the box is full forever! In the above example the contents 
of the box Steve is {33} indefinitely and cannot be changed. Variables with this 
property are often called single-assignment variables. 

Since it is possible to place anything inside a box, we should not be surprised 
to find that a box can be placed inside a box. This corresponds to making one 
variable an instance of another. For example, consider the assignment: 


Steve := Fred 


After this assignment is executed any part of the program that refers to Steve 
automatically has access to the Fred box. Another way to think of this is that 
Steve becomes an alias (i.e. another name) for Fred. 

A final aspect of the boxes is that a box cannot be placed inside itself; indeed, 
anything containing a particular box cannot be placed inside that box. Thus, the 
assignments Steve := Steve and Steve := [Fred,Steve,Bert] are illegal. Another way 
of saying this is that circular references and circular terms cannot be manipulated 
in Strand. 


Assignment Rules: 


e A variable can only be assigned once. 
e Aliasing is permitted. 
e No circular references or terms. 





2.5.2 Matching 


Strand employs a matching algorithm to compare a rule head with a process. We 
have already seen this algorithm in action in some simple programs. For example, 
recall the first rule of the area process presented in Section 2.3: 


area(square,D,V) :— V is D«D. 


The string square in the rule head represents a precondition: An area process can 
only be reduced using this rule if its first argument matches the string square. 
Execution of the process area(square,12,V) hence involves the matching operation: 


square matched-to square 


These two strings are identical, since they are composed of the same sequence of 
characters. Thus, the matching operation succeeds. 


32 Chapter 2. Strand Programming 



















q4 














` \i \" we 


NN 
A LOG À A: a 1l 


To develop some intuition about matching, let us consider a variety of other 
attempts at matching. In each case we shall write a part of the process state on 
the left, and the corresponding part of the rule head on the right. First, consider: 


3.1 matched-to 3 


These two terms do not match as they are not the same number. Numbers only 
match if they are identical. In addition, integers do not match their real equivalent; 
thus 1 and 1.0 do not match. 

Now consider matching strings. We have already observed that strings match 


only if their characters are pairwise identical. Thus, the following strings do not 
match: 


“this string” matched-to “that string” 


Two structures match if they are the same size and corresponding subterms match. 
For example, each of the following matches succeed. 


{1,abc} matched-to {1,abc} 
[1,abc] matched-to [1,abc] 
{1.[a,b,c]} matched-to {1,[a,b,c]} 


In contrast, the following matches do not succeed. 


2.5. Basic Operations 33 


{1,abc} matched-to {1.0,abc} 
[1,abc] | matched-to __ [1,def] 
{1,[a,b,c]}| matched-to  {1,{a,b,c}} 


In all these cases matching has been used to check the equality of terms. In fact, 
Strand’s matching algorithm is more powerful than these examples indicate and 
has two additional functions: destructuring and data-flow synchronization. 

Destructuring. None of the examples considered previously included vari- 
ables in the rule head. Consider the following match: 


[1,2] matched-to [H|T] 


Although H and T in the rule head are variables, matching can succeed because the 
two terms can be made equivalent. This is achieved by substituting the value 1 for 
H and the value [2] for T. To understand how the match algorithm obtains these 
substitutions let us look at the structures representing the terms being matched: 


The match algorithm traverses both structures in a depth-first manner. A variable 
encountered on the right is assign’ed the corresponding value on the left; match- 
ing then continues as if the two subterms had been found to be equivalent. In 
the example above, this simple algorithm results in the assignments H/1 and T/(2]. 
This example illustrates how matching can be used to pull apart or destructure 
the process state. In this case, matching has separated a list structure into a head 
part H and a tail part T. 


Matching Rules: 


e Corresponding head and process 
arguments are matched. 
e A number or string in the head 


matches that number or string. 
e A structure in the head matches 
that structure if corresponding 
subterms match. 
e Variables in the head match 
anything. 





34 Chapter 2. Strand Programming 


Data-flow Synchronization. Although the matching algorithm may assign 
values to variables in the rule, it may not assign values to variables in the process 
state. To highlight this difference let us simply change the order of the previous 
match: 


[H|T] matched-to [1,2] 


H is now a variable in the process state and cannot be assigned a value by the 
matching algorithm. Instead, the match is postponed until H is no longer a vari- 
able. Assume that some other process eventually assigns the value 1 to the variable 
H. Now it is possible to repeat the matching as follows: 


[1|T] matched-to [1,2] 


Matching may now proceed to compare the variable T in the process with the term 
[2]. Again, this match cannot be completed because a variable on the left would 
need to be assigned a value for the two terms to be equivalent. Thus, the match 
is postponed again; this time until T is defined. 

This example demonstrates a concept called data-flow synchronization: Match- 
ing is forced to wait or suspend until sufficient data is available for the match to 
complete. Later we will see how this aspect of matching is used to synchronize 
process actions. In essence, it is the availability of data that determines when 
processes are able to execute. 


Matching is used to: 


e check input data 
e pull structures apart 
è synchronize events 





Summary. Given the basic intuition we have developed, let us summarize 
the match algorithm. It is applied left-to-right textually and depth-first in term 
structures and may succeed, fail or suspend. If it succeeds, it generates a (possibly 
empty) set of assignments to variables in the rule head. For future reference, we 
will denote this set by the symbol ©. 

Integers, reals and strings match only if they are identical in both type and 
value. Structures match if they are the same size and corresponding arguments 
match. A variable V in a rule matches any term T in a process (including a 
variable); in addition, the assignment V/T is added to O. An attempt to match a 
variable in a process with a non-variable in a rule head causes a match to suspend. 
All other matches fail. 


2.5. Basic Operations 35 


2.5.3 Guard Execution 





Previous example programs introduced a number of guard tests (e.g., =\=). Strand 
provides a variety of these tests, which can be used to perform type checking, 
arithmetic comparison and term comparison. Like matching, these tests serve to 
constrain when a rule may be used to execute a process. 

Any number of tests can be written in a rule guard. They are executed from 
left-to-right textually after matching is complete. Each test may succeed, fail or 
suspend. If all tests in a guard succeed, then the guard as a whole succeeds; if 
any test fails or suspends, then the guard as a whole fails or suspends respectively. 
Tests generally suspend if they encounter a variable during evaluation. 

To illustrate how multiple tests are employed, consider the following rule: 


threshold(A,B) :— 
integer(A), integer(B), B > A | true. 


This rule defines the process behavior: 


“threshold processes terminate if both arguments are integers and the 
value of the second argument exceeds that of the first.” 


Consider execution of the following process using this rule: 
threshold(55, 72) 


Matching generates the assignments A/55 and B/72. Thus, the following guard is 
executed: 


36 Chapter 2. Strand Programming 


integer(55), integer(72), 72 > 55 


Both the integer tests succeed because their arguments are indeed integers. The 
test ‘>’ also succeeds since 72 is indeed greater than 55. As all tests succeed, 
the guard itself succeeds and the process terminates. Now consider an alternative 
process: 


threshold(55,27) 


In this case the integer tests succeed but the ‘>’ test fails. Thus, the guard fails 
and some other rule must be used to reduce the process. Finally, consider the 
process: 


threshold(55,X) 


Here the second integer test cannot complete because its argument is the variable 
X. The guard therefore suspends indicating that there is not sufficient information 
to complete guard execution. Here again we see data-flow synchronization delaying 
process execution. 

The threshold example used two simple tests, integer and ‘>’. Strand provides 
an appropriate test for each data type and a variety of tests for arithmetic com- 
parison. The principal type-checking tests are: 


integer(T) - T is an integer. 


real(T) - T is a real number. 
string(T)  - T is a string. 

list(T) - T is a list. 

tuple(T) - T is a tuple. 


The tests for arithmetic comparison are: 


X>Y — X is greater than Y. 

X<Y — X is less than Y. 

X=< Y -Xis less than or equal to Y. 
X>=Y -Xis greater than or equal to Y. 


Two further tests permit comparison of arbitrary terms: 


X==Y -Xand Y are identical. 
X =\=Y -X and Y are not identical. 


Two terms are defined to be identical if they are the same number or string. In 
addition, two structures are identical if they are the same size and corresponding 
subterms are identical. Using these tests the following guards succeed: 


5 == 5, 53.6 > 2 
“abc” =\= “def”, abc == abc 
[1,{2,abc},3|[]] == [1,{2,abc},3], 15 =< 15 


2.5. Basic Operations 3? 


In contrast, the following guards fail: 


5 == 5, 3.2 == 3.1999999 
53.6 > 2, 23.9 > 50 
abc == abc, {{{a},1},[],[2]} == ({{a}.1} 01 {2l 


Finally, the following guards suspend: 


15 > 7, [abc|X] == [abc,d] 


== |], {1,2,[].4} =\= {1,2,[],Y} 
{1,2,3} == {1,2,3}, 23.7 > X 


Exercises 
2.9 Which of the following assignments are legitimate: 


(a) X := ok_-bub 

(b) Y := ok_bub(1,2,3,[X|Xs],Y,Z) 

(c) 57 := [my,little,assignment] 

(d) Variable1 := Ok, Variable1 := NotOk, Variable1 := 5 
(e) Ok := 7, Variable1 := Ok 

(f) Cookies := {chocolate, almond and nuts} 


(g) - 
(h) Yup := "Nope Nope := Hum, Hum := Um, Um := Nope 


2.10 Show the structures involved in each of the following matches: 


(a) process1(U1,2,U2,4) matched-to process2(1,V1,[H|T], V2) 
(b) process1({H,1|T],U1) matched-to process1([B,C|Y],{1,Z}) 
(c) p({U2,{U3}}) matched-to p({1,{[V4/V5]} }) 


2.11 For each of the matches in 2.10 give the sequence of assignments made during 
execution of the match algorithm. 


2.12 For each of the matches in 2.10 that must be postponed, give two possi- 
ble assignments to variables in the process which will allow each match to 
complete satisfactorily. 


2.13 For each of the matches in 2.10 that must be postponed, give two possible 
assignments to variables in the process which will cause the match algorithm 
to fail. 


38 Chapter 2. Strand Programming 


2.6 Playing with Structures 


We will now show some simple list and tuple processing programs. As before, each 
program is preceded by a short specification in English. The exercises at the end 
of this section provide example processes, which the reader is advised to execute 
by hand. 

Sum List. Compute the Sum of the elements of a list of numbers L. 


sum(L,Sum) :- sum1(L,0,Sum). % initialize accumulator to 0 
sum1 ([X|Xs],A,Sum) :- % destructure list 
A1 is A +X, % add head to accumulator 
sum1(Xs,A1,Sum). % sum rest of list 
sum1([],A,Sum) :- % end of list encountered 
Sum := A. % return sum 


This program consists of two process definitions. The sum process simply changes 
state to a sum1 process with an additional argument. This argument is initially 
zero and represents an accumulated sum. The sum1 process sums a list of numbers 
by adding each number in the list to the accumulator. When the end of the list is 
reached, the value of the accumulator is assigned to the result variable Sum. Note 
the use of matching to destructure the list at each stage of the algorithm. 

The sum1 process definition is the first we have encountered that employs the 
important programming concept of recursion. Definitions of this type use two 


2.6. Playing with Structures 39 


forms of rule which either perform recursion or specify a stopping condition. A 
recursive rule continues to execute the current process with a modified data state; 
the first rule of sum1 is an example of this rule type. A rule specifying a stopping 
condition may terminate recursion when some condition is satisfied. The second 
rule of sum1 is a rule of this type; it causes the sum1 process to terminate at the 
end of the input list L. Recursion is the mechanism used in Strand to implement 
iteration. 

Member. Given a value X and a list L, return as R the string true if X is in L 
and false otherwise. 


member(X,[X1|Rest],R) :— % destructure list 
X =\=X1 | % compare head with X 
member(X,Rest,R). % not found: check tail 

member(X,[X1|-],R) :— % destructure list 
X == X1 | % compare head with X 


R := true. % found: return true 
member(.,[],R) :— % reached end of list 
R := false. % return false 


This recursive program inspects each element of the list L until it either encounters 
the value X or reaches the end of the list. If X is found, R is assigned the value 
true; if the end of the list is reached, R is assigned the value false. 

Reverse. Given a list L, construct a new list R in which the elements of L 
occur in the reverse order. 


reverse(L,R) :— reverse1(L,[],R). % initialize accumulator to [] 


reverse 1([X|Xs],A,R) :— % select list head 
reverse1(Xs,[X|A],R). % reverse remainder 
reverse1([],A,R) :— % end of list 
R := A. % return accumulator 


This recursive program operates in a similar manner to sum. However, its accu- 
mulator is a list data structure rather than a number. At each recursive step, 
the process selects the head X of the list L and adds it to the beginning of the 
accumulator. Note how the accumulated list is carried forward at each recursive 
call and is returned when the process terminates. 

Remove. Given two lists L1 and L2, construct a list Rs that contains all 
elements in the list L1 that are not in the list L2. 


remove([X|L1],L2,Rs) :— % access head of L1, X 
member(X,L2,T), % determine if X in L2 
remove1(T,X,L1,L2,Rs). % maybe add X to Rs 

remove([],..Rs) :— % end of L1 
Rs := []. % terminate Rs 


40 Chapter 2. Strand Programming 


remove (true,_,L1,L2,Rs) :— % X in L2: do not add to Rs 
remove(L1,L2,Rs). % check rest of L1 

remove 1(false,X,L1,L2,Rs) :— % not in L2 
Rs := [X|Rs1], % add X to Rs 
remove(L1,L2,Rs1). % check rest of L 


This program consists of two mutually recursive process definitions. The first 
selects an element X from the list L1 and uses member to determine whether X is 
in the list L2. The remove1 process inspects the result T of member and takes the 
appropriate action. Note the assignment Rs := [X|Rs1] in the last rule; this is used 
to incrementally generate the result list Rs. 

Sum Tree. Compute the Sum of a binary tree of integers T. A binary tree 
can be represented in Strand using terms of the form: 


{ LeftTree,RightTree} 


where the LeftTree and RightTree are either terms of the same form or integers. 
For example, the term: 


{{5,7}, {16,3}} 


represents the tree: 


The following program sums the leaves of a tree represented in this manner. 


sum.tree({L,R},Sum) :— 
sum-tree(L,A1), 
sum.tree(R,A2), 
Sum is A1 + A2. 
sum.tree(N,Sum) :— 
integer(N) | 
Sum := N. 


It is interesting to compare this program with the sum program presented at the 
beginning of this section. Like the sum program, this process definition recursively 
decomposes a structure. However, instead of making a single recursive call at each 
step, it forks processes to sum both the left and right subtrees concurrently. 


2.7. A Programming Convenience 41 


Exercises 


2.14 Execute the following processes by hand, writing down the process actions 
performed at each step and the process pool that results. 


(a) sum({1,2,3],R) 

(b) member(john,[mary,john,peter],R) 
(c) member(joe,[mary,john],R) 

(d) reverse([1,age(32),fred],R) 

(e) remove([joe,mary,john],[joe,peter],R) 
(f) sum-tree({{5,7},{16,3}},R) 


2.15 Write a program that computes the length of a list L. 


2.16 Write a program that takes a list of integers L and computes a new list in 
which each element is multipled by 10. 


2.17 Write a program that removes duplicate elements of a list. 
2.18 Write a program that forms the intersection of two lists. 
2.19 Write a program that computes the number of leaves in a binary tree. 


2.20 Write a program that computes the maximum depth of an n-ary tree. 


2.7 A Programming Convenience 


It is common practice in Strand programming to attempt to verify that two process 
arguments are identical using the guard test ==. As a shorthand for this test, 
Strand allows you to write any variable V multiple times in a rule head. This is a 
shorthand for equality tests in the following sense: After the first, each subsequent 
occurrence of V in the rule head is translated into a new unique variable, V;. An 
explicit == test is then added to the beginning of the rule guard to test the 
equality of V and V;. The order of these new tests corresponds to the order 
in which multiple occurrences are encountered during a left-to-right, depth-first 
traversal of the rule head. 

For example, recall that the second rule of the member program given in Sec- 
tion 2.6 is written as: 


member(X,[X1|Xs],R) :— X == X1 |R := true. 
It can be written more concisely as: 
member(X,[X|Xs],R) :— R := true. 


This notational convenience relaxes our original definition of a Strand rule which 
stated that no variable may occur more than once in a rule head. 


42 Chapter 2. Strand Programming 
2.8 A Strand Interpreter 


Let us bring together the program execution concepts introduced in previous sec- 
tions by outlining a simple Strand interpreter. This provides a machine-oriented 
view of how programs execute; readers who prefer a more abstract exposition are 
referred to Section 2.9. 

Algorithm 2.1 implements a Strand interpreter; it is written in a Pascal-like 
notation where block structure is indicated by indentation. The algorithm takes 
as input an initial set of processes and a program S. It uses a composite match 
procedure CMatch which implements both the Strand matching and guard execu- 
tion algorithms. A call to CMatch takes as arguments a process P and a rule R. It 
returns: 


e © if the match succeeds giving the set of assignments ©, and the guard also 
succeeds. 


ə suspend in all other cases. 


interpreter() 
for each initial process P 
put_process(P) { put P in process pool } 
repeat 
P := get_process() { get a process from pool } 
if (is_predefined(P)) execute(P) { predefined process } 
else reduce(P) { otherwise, do reduction } 


until(empty pool) 


reduce(P) 
COMMIT := False { initialize Flags } 
repeat 
R := pick_untried_rule(P,S) { get a rule from S } 
R1 := fresh_copy(R) { copy the rule to R1 } 
M := CMatch(P,R1) { execute match/guard } 
if (M=©) then { CMatch succeeds? } 
COMMIT := True { finished looking } 
spawn_body(R1,@) { add processes to pool} 
until (COMMIT) or (all_rules_tried(P)) { reduced or done } 
if (not COMMIT) then put_process(P) { return process to pool } 


Algorithm 2.1: A Strand Interpreter. 


2.8. A Strand Interpreter 43 


Compare Algorithm 2.1 to the brief operational description given at the beginning 
of Section 2.4: 


Computation proceeds by repeatedly selecting a process and removing 
it from the pool. If the process is a predefined process (such as X := Y), it 
is executed immediately; otherwise a reduction attempt is made. This 
involves selecting a rule from the program, matching the process to the 
rule head and executing the rule guard. If the preconditions specified 
by the head and guard are satisfied, then the process commits. This 
causes new copies of the processes defined in the rule body to be added 
to the process pool. 


At each step, the interpreter nondeterministically selects a process from the 
process pool and a rule from the program. The manner in which the process and 
rule are chosen is not specified and cannot be assumed. The Strand programmer 
must ensure that all possible choices will result in the correct solution being com- 
puted. This is actually much easier than one might expect but requires a different 
viewpoint when programming. In C or Pascal, programs are organized around 
the flow of control, i.e., the sequence of statement executions. In Strand, we are 
instead concerned with the flow of information, i.e., data. The order in which 
processes execute is determined only by data availability, not their textual order 
in a program. 


Organize computation around 


the flow of data, not the flow of 
control. 





Recall that when discussing matching we described how it can be necessary to 
postpone a match if data is not available. Algorithm 2.1 uses busy waiting to 
achieve this postponement: A process which suspends is returned to the process 
pool. It is this mechanism that synchronizes process actions. 

A common source of error in Strand is to write process definitions that are 
incomplete in that on certain inputs either matching or guard evaluation fails for 
all rules. As a result, the process suspends forever. The Strand debugger can be 
used to detect this process failure. 

Finally, note that the interpreter is an abstract Strand specification: Practical 
implementations employ optimizations that, for example, prevent busy waiting 
and reduce the number of rules matched at each step. 


44 Chapter 2. Strand Programming 


Executing a Program. Recall the sum program presented in Section 2.6 
which computed the sum of a list of numbers. 


sum(L,Sum) :— sum1(L,0,Sum). % initialize accumulator to 0 
sum1([X|Xs],A,Sum) :— % destructure list 
A1 is A +X, % add head to accumulator 
sum1(Xs,A1,Sum). % sum rest of list 
sum1([],A,Sum) :— % end of list encountered 
Sum := A. % return sum 


Table 2.2 traces the execution of this program using the interpreter described by 
Algorithm 2.1. The inputs provided to the interpreter are the sum program and 
the following set of initial processes: 


sum([1|L],R), L := [2,3]. 


Table 2.2 has the same form as Table 2.1; its four columns specify a computation 
step number, the process chosen for reduction, the result of the step, and the 
new process pool. Unlike Table 2.1, the result of a computation step can also be 
suspend. This indicates that a process cannot yet reduce and must be postponed. 

This computation demonstrates all essential aspects of Strand program execu- 
tion. At various steps there are a number of processes in the process pool; these 
can be solved in any order or concurrently. Thus, at step 2 there are potentially 
three processes which can be solved at the same time. Let us consider some of the 
more important steps in the computation more closely. 

At step 3, the process sum1(L,A1,R) suspends. To see why this occurs consider 
the effect of matching the process with the head of each of the two sum1 rules. 
The first rule matches the variable L against the value [X|Xs]; the second matches 
this variable against the value []. As L is a variable in the data state of the process, 
both matches must be postponed. As the process is unable to commit using any 
rule, it is placed back into the process pool. Thus, step 3 illustrates a process 
suspending because data is unavailable. Step 8 shows a similar situation in which 
the predefined process is requires the value of the variable A2 before it is able to 
execute. 

At step 4, the selection of an assignment process causes the variable L to 
take the value [2,3]. If the process that suspended in step 3 is now retried, it 
may reduce using the first sum1 rule; this occurs at step 5. Thus, steps 3, 4 
and 5 demonstrate data-flow synchronization; the sequence in which processes are 
executed is determined by the availability of the list L. 

Closer inspection of the process pool after step 6 reveals that each occurrence 
of the variable A1 is distinguished by renaming, e.g., A2, A3. This renaming has 
been performed to stress that the variables are distinguishable because of rule 
copying performed in Algorithm 2.1. Copying has the effect of generating new 
variables whenever a rule is used in a reduction attempt. 


2.8. A Strand Interpreter 45 


Table 2.2: Executing the sum process. 


Step Pick Result Process Pool 
0 - - sum([1|L],R), L := [2,3] 
1 1 change state sum1([1|L],0,R), L := [2,3] 
2 1 change state A1 is O + 1, L := [2,3], 


+ fork sum1(L,A1,R) 
3 3 suspend no change 
4 2 terminate A1 is 0 + 1, sum1([2,3],A1,R) 


5 2 change state sum1([3],A2,R),A1 is O + 1, 
+ fork A2 is A1 +2 


6 2 terminate sum1([3],A2,R), A2 is 1 + 2 


T 1 change state A3 is A2 + 3, sum1([],A3,R), 
+ fork A2is1 +2 


8 1 suspend no change 

9 3 terminate A3 is 3 + 3, sum1({[],A3,R) 
10 1 terminate sum 1 ([],6,R) 

11 1 terminate R:=6 


12 1 terminate <empty> and R=6 


46 Chapter 2. Strand Programming 


The computation ends at step 12 when the process pool is empty. Intuitively, 
the result of the computation is the value 6. 

Conclusion. As we have pointed out previously, variables behave like com- 
munication channels. In Table 2.2, when the variable L is assigned the value [2,3] 
in step 4, the value becomes available (or is sent) to the sum1 process. Thus, 
variables provide a simple message-passing abstraction in which assignment cor- 
responds to sending a message and matching corresponds to receiving a message. 
Although simple, this abstraction is powerful and can be used to express a wide 
variety of complex communication protocols. 


e Variable = Communication Channel 


e Assignment = Output (message send) 


e Match = Input (message receive) 





Exercises 


2.21 Give an alternative set of program execution steps for the execution of the 
sum1 program that computes the same answer but involves no process sus- 
pension. 


2.22 Specify an initial set of processes that cause the computation to end in a 
state where all processes suspend and no progress can be made. 


2.23 Define an assignment process that when added to the computation in Exer- 
cise 2.22 permits the computation to terminate. 


2.24 Execute the following processes by hand using the programs designed in Sec- 
tion 1.6. At each step, write down the sequence of CMatch results assuming 
that clauses are tried in their textual order. Also specify the state of the 
COMMIT flag following each clause try. 

(a) reverse([3,2|X],R) 
(b) remove({t,2],[1],R) 
(c) sum-tree({{1,2},3},R) 


2.9. The Operational Model 4? 


2.9 The Operational Model 


We now present a more abstract view of Strand program execution; this section 
can be skipped if desired. 

Composite Matching. The procedure CMatch takes as arguments a process 
p and arule of the form R = H :—G |B. It executes the match algorithm followed 
by guard execution and returns: 


e O if match(p,H) = O A GO = true 


è suspend otherwise. 


States. The state of a Strand computation is a multi-set of processes. Every 
process either is or is not an assignment process of the form X:=Y. 

Transitions. A transition rule specifies a mapping between states. Execution 
of a program P may include the following transitions: 


Reduction: {pi,..., Pi, ---, Pn} — {pi,---, BO,..., Pn} 
If pi #X:=Y A CMatch(p;,R’)=0, where R’=H :— GIB is a fresh copy of REP. 


Assignment: {p,,..., X:=Y, ..., Pn} — {pi,---; Dn}[X/Y] 


Suspension: {pi,.-., Pn} — <suspend> 
Ifn>0 A Vpi(p; #X:=Y A (VREP, CMatch(p;,R)# 9)) 


Note that in the reduction rule, if B is the empty body, then no processes are added 
to the new state. Note also that in the assignment rule X must be a variable that 
does not occur in Y. 

Computations. A computation is a sequence of transitions that ends in a 
terminal state (i.e., one in which no further rules can be applied). A computation 
that ends in the state { } is called a successful computation. A computation that 
ends in the state <suspend> is termed a suspending computation. 


2.10 A Programming Example 


We conclude this introduction to Strand by gradually developing the solution to 
a programming problem. This opportunity is used to introduce a program design 
methodology termed stepwise refinement. The problem to be solved is: 


Problem 2.1: “Given a ground term X (i.e., one that contains no 
variables), determine how many integers and strings are contained in 
xX.” 


48 Chapter 2. Strand Programming 





The program we will develop takes a term as input and returns the number of 
integers (Icnt) and the number of strings (Scnt) in the term. Here is a typical 
process that might execute using this program: 


scan({1,2,[],[ab,cd],3.2}, lent, Scent) 


One way to count elements is to initialize an accumulator at zero and increment 
it each time an element is detected. The accumulated value is returned when 
nothing remains to be counted. Thus, our first refinement of the problem is to 
add accumulators that are initially zero: 


scan(T,ls,Ss) :— scan1(T,0,Is,0,Ss). 


To refine the scan1 process further we consider all possible inputs to the process 
and write a rule for each. Let us begin with the case where the input term T is an 
integer. In this case the output count of the number of integers is just one more 
than the number of integers counted so far; the number of strings counted remains 
unchanged: 


2.10. A Programming Example 49 


scan1(T,li,lo,Si,So) :— 
integer(T) | lo is li + 1, So := Si. 


A similar rule is used when the term T is a string: 


scan1(T,li,lo,Si,So) :— 
string(T) | So is Si + 1, lo := li. 


If the input term T is a real number or an empty list, the number of integers and 
strings remain unchanged: 


scan1(T,li,lo,Si,So) :— 
real(T) | lo := li, So := Si. 
scan1([ ],li,lo,Si,So) :— 
lo := li, So := Si. 


Only two cases remain: a non-empty list and a tuple. The number of integers in 
a list is the number in the head of the list plus the number in the rest of the list. 
Since the list may contain terms that contain integers, it is not sufficient just to 
test the list head to see if it is an integer. Instead, it is necessary to scan the head. 
We begin by outlining the appropriate process action: 


scan1([Head|Rest], ...) :— 
scani(Head, ...), 
scan (Rest, ...). 


Matching is used to determine if the input term is a non-empty list and to decom- 
pose it into constituent parts. The number of integers in the list is the number in 
the Head plus the number in the Rest. This number must be added to the number 
of integers detected in the term so far and then output as the result. We now 
refine the above outline to include this counting: 


scan1([Head|Restl],li,lo, ...) :— 
scan1(Head,li,!1, ...), 
scan1(Rest,l1,lo, ...). 


The first scan1 process in the rule body takes the number of integers so far (li) 
aS an argument. It adds the number of integers in the Head of the list to the 
accumulator li and generates the result 11. This new number is communicated to 
the second scan1 process which adds the number of integers in the Rest of the 
list. This second process eventually generates the final output (lo). A similar 
refinement handles the counting of strings: 


scan1([Head|Rest],li,lo,Si,So) :— 
scan1(Head,li,11,Si,S1), 
scan1(Rest,!1,lo,S1,So). 


50 Chapter 2. Strand Programming 


Finally, consider the case where the input term is a tuple. Here we delegate the 
responsibility for iterating over the tuple arguments to a new process: 


scan1(T,li,lo,Si,So) :— 
tuple(T) | scan_args(T,li,lo,Si,So). 


Since iteration is expressed in Strand using recursion, we must consider what 
stopping conditions are necessary. In outline, the scan_args process is refined as: 


scan.args(Tuple,...) :— 
not-all_arguments.done | 
extract_next_arg(Tuple,Arg), 
scan1(Arg,...), 
scan_args(Tuple,...). 
scan_args(Tuple,...) :— 
all.done | true. 


One obvious way to stop is to detect that recursion has reached the last argument. 
Two numbers are necessary to achieve this, the current argument number and 
the last argument number. The scan-_args process must therefore be provided 
with these values as inputs. We refine the process one step further by adding the 
appropriate stopping conditions: 


scan_args(Tuple,On,Last,...) :— 
On =< Last | 
extract_next_arg(On, Tuple,Arg), 
scani1(Arg,...) 
On1 is On + 1, 
scan_args(Tuple,On1,Last,.. .). 
scan-_args(Tuple,On,Last,...) :— 
On > Last | true. 


This process specification uses the first rule to consider each argument beginning 
with the first. The current argument number (On) is used to extract an argument 
(Arg) from the tuple (Tuple). The argument is then scanned to find the number 
of integers and strings it contains; the remaining arguments are considered recur- 
sively. The process terminates when the current argument number (On) exceeds 
the number of arguments in the tuple (Last) by virtue of the second rule. This 
indicates that all the arguments have been considered. 

At this point we should notice an error in the program: The scan1 process 
does not pass initial values for On and Last to the scan-_args process. To correct 
this error in the design it is necessary to backtrack and redefine the appropriate 
rule: 


scant(T,li,l0,Si,So0) :— 


tuple(T) | 
length(T,A), scan_args(T,1,A,li,lo,Si,So). 


2.10. A Programming Example 51 


This definition uses the predefined length process to determine the size of the input 
tuple T. Scanning of the arguments begins with the first argument (1) and ends 
at the last argument (A). 


Returning to the scan-args process, we now decide how to extract an argument 
from a tuple. This is achieved using a predefined process get-arg(N,T,A) which 
makes A the Nth argument of tuple T. Here is the refined specification: 


scan-args(Tuple,On,Last,...) :— 
On =< Last | 
get-arg(On, Tuple,Arg), 
scan1(Arg,...) 
On1 is On + 1, 
scan-args(Tuple,On1,Last...). 
scan-args(Tuple,On,Last,...) :— 
On > Last | true. 


To complete the process definition we refine it to update the output counts lo and 
So. Here we learn from our experience in designing the list processing rule and 
immediately thread the appropriate communication channels: 


scan_args(Tuple,On,Last,li,lo,Si,So) :— 
On =< Last | 
get_arg(On, Tuple,Arg), 
On1 is On + 1, 
scani(Arg,li,11,Si,S1), 
scan_args(Tuple,On1 ,Last,!1,10,S1,So). 
scan_args(Tuple,On,Last,li,lo,Si,So) :— 
On > Last | lo := li, So := Si. 


At this point the design process is complete; every statement is a legal Strand 
statement, the program can be compiled and executed. We could stop here, and 
in fact many programmers would not think to continue. However, it is important 
not to be satisfied with a program that just works. We must also strive for elegance 
and simplicity; these will lead to more efficient programs and programs that are 
easier to maintain and reuse. The best programmers are those who continue to 
search for an elegant, concise formulation of their problem. 


Let us reconsider the scan_args process for a moment; it iterates over the 
arguments of a tuple in order and spawns concurrent processes to accumulate the 
appropriate counts. It is not necessary to spawn these processes in increasing 
argument order. They could be spawned in decreasing order or in fact, in any 
order at all. Let us experiment with decreasing order: We begin with the Nth 
argument and stop at zero. In outline, the process is rewritten as: 


52 Chapter 2. Strand Programming 


scan.args(Tuple,On,...) :— 
On > 0 | 
get_arg(On, Tuple,Arg), 
scan1(Arg,...) 
On1 is On — 1, 
scan_args(Tuple,On1,. .. ). 
scan-args(Tuple,0,...). 


Notice that the new process has one less argument; it uses a test against a constant 
in the first rule and matching in the last rule. It is more concise, elegant and 
efficient than the original program. Program 2.1 presents the completed scan 
program and includes a fully refined version of this process definition. 


scan(T,Icnt,Scnt) :— scan1(T,0,Icnt,0,Scnt). 


scan1(T,li,lo,Si,So) :— 
integer(T) | lo is li + 1, So := Si. 
scan1(T,li,lo,Si,So) :— 
string(T) | So is Si + 1, lo := li. 
scan1(T,li,lo,Si,So) :— 
real(T) | lo := li, So := Si. 
scan1({ J,li,lo,Si,So) :— 
lo := li, So := Si. 
scan1([Head|Rest],li,lo,Si,So) :— 
scan1(Head,li,11,Si,S1), scan1(Rest,11,10,S1,S0). 
scan1(T,li,lo,Si,So) :— 
tuple(T) | 
length(T,A), scan-args(T,A,li,lo,Si,So). 


scan-args(Tuple,On,li,lo,Si,So) :— 
On > 0 | 

get-arg(On, Tuple,Arg), 
scan1(Arg,li,11,Si,S1), 
On1 is On — 1, 
scan-args(Tuple,On1,11,10,S1,S0). 

scan-args(Tuple,0,li,lo,Si,So) :— 
lo := li, So := Si. 


Program 2.1: Solution to Problem 2.1 


This program development has employed four predefined processes. Strand 
provides a variety of useful processes to simplify program design. A list of the 
processes available and a description of how these operate is given in Appendix A. 


2.11. Summary 33 


2.11 Summary 


In this chapter we have introduced the basic Strand programming concepts and 
have described how programs execute. Computation is performed by a pool of in- 
teracting, concurrent processes. These processes communicate by assigning values 
to shared variables and synchronize by waiting for data to become available. Each 
process can perform only three basic actions: terminate, change state or fork new 
processes. These actions are specified by the collection of rules that constitute the 
program. 


Exercises 


2.25 Execute Program 2.1 by hand on the following data. 


(a) a(1) 
(b) abc(1,2,c) 


(c) [a,b,{1,2}] 


2.26 Extend Program 2.1 to count the number of tuples with more than ten 
arguments. 


2.27 Design a program messup that accepts as an argument a term and produces 
a new term in which all the integers are one greater than in the original. 


2.28 Design a program reorder that takes a term and yields a new term in which 
the arguments to all structures are reversed. 


2.29 Design a program that takes a partial term, waits for it to be filled in and 
counts all the integers and strings in it. By partially filled in we mean that 
some sections of the term are not yet generated and are thus represented 
by variables, e.g., f({1,X}). When run on this input your program should 
suspend until the variable X is bound, e.g., X := a. Thus, the whole input is 


f({1,a}). 





Chapter 3 


Six Basic Techniques 


This chapter is devoted to six fundamental programming techniques. These tech- 
niques provide the building blocks from which all Strand programs are constructed. 
Here is a brief summary of the techniques we will consider: 


Communication Protocols. Three techniques are used to specify 
inter-process communication protocols: producer-consumers, incom- 
plete messages and bounded buffers. 


Difference Lists. A difference list is a representation of a list. The 
technique for manipulating this representation allows it to be con- 
structed in parallel by a set of processes. 


Short-Circuits. The short-circuit technique is used to detect when a 
collection of processes has terminated. 


Blackboards. The blackboard technique allows multiple processes to 
both read and atomically update a shared data structure. 


To a large extent, programming in Strand revolves around the repeated use of 
these six techniques in different guises. 


Six Basic Techniques: 


e Producer-Consumers 
e Incomplete Messages 
e Bounded Buffers 

e Difference Lists 

e Short-Circuits 

e Blackboards 





39 


36 Chapter 3. Six Basic Techniques 


3.1 Communication Protocols 


Chapter 2 introduced the basic concepts involved in organizing inter-process com- 
munication. To briefly review these concepts, consider the following initial process 
pool: 


producer(C), consumer(C) 


Two processes share a common communication channel represented by the shared 
variable, C. The producer is able to send a single message to the consumer via this 
channel. To generate the message, the producer uses an assignment. For example, 
consider the producer definition: 


producer(Out) :— Out := my_message. 


This assigns the string my_message to the shared variable C and causes the mes- 
sage to be communicated to the consumer process. The consumer may postpone 
execution until the message arrives using matching as in the rule: 


consumer(my_message) :— perform-_action. 


Matching the string my-message in the rule head delays the consumer until the 
variable has the appropriate value. After the message arrives, the consumer per- 
forms some action. 

This organization suffices when the producer and consumer exchange only a 
single message. If many messages are to be exchanged, more sophisticated tech- 
niques are required. To introduce the ideas behind these techniques we begin with 
an analogy. 


On your way home from the movies you inadvertently trip on an ugly- 
looking bottle and, out of curiosity, pick it up. The bottle is covered 
with a slimy-looking film under which you discern the remnants of an 
exotic label. Intrigued, you feverishly clean away the film. Suddenly, 
from the depths of the bottle, out jumps an enormous green genie who 
is even more ugly than the bottle! Although repulsed, you overcome 
your displeasure when, after considerable prompting, the genie agrees 
to grant you a wish. 

You immediately surmise that this must be a rather cheap genie; 
everyone knows that three wishes is the standard deal for releasing a 
genie. After a moment’s trepidation you suddenly realize how to over- 
come the problem and loudly pronounce: “I wish I had two wishes!” 
The first wish allows you to get that Ferrari you have always wanted. 
What do you use the other wish for? Well, being a somewhat reckless 
driver you decide to play it safe and immediately wish for... two more 
wishes. You recognize that you can continue obtaining more Ferraris 
as long as you keep at least one wish for the sole purpose of obtaining 
new wishes. 


3.1. Communication Protocols J? 






YOU’VE Gor 7] WISH... 


WHAT IS IT? 








W UH.. I WISH FOR 


WISHES! 





A variable in Strand is like a wish: Once you have given it a value (used a wish) 
it is gone forever! Remember, the single-assignment rule states that it is only 
possible to assign a value to a variable once. To take this analogy further, we 
should consider what it means to obtain two new wishes (variables). In Strand, 
this can be achieved with the following assignment: 


Wish := [Wish1|Wish2] 


Although this assignment uses up the variable Wish, it creates two new variables: 
Wish1 and Wish2. The operation can be repeated when more variables are required. 
For example: 


WO := [W1 |W2] 
W2 := [W3|W4] 


W6 := [W7|W8] 


etc. 


58 Chapter 3. Six Basic Techniques 


Since the term [W1|[W3|W4]] is equivalent to the term [W1,W3|W4], this set of 
assignments generates the following structure: 


[W1,W3,. .. |Wn] 


The variables W1, W3, etc. can be used to hold values; these correspond to the 
wishes used in our analogy for obtaining a Ferrari. The tail of the list is always a 
variable; this corresponds to the wish used to obtain more wishes. An incomplete 
structure of this type is called a stream; it is constructed incrementally by adding 
elements ([...|...]) to the end of the stream one at a time. The stream is closed 
by binding the unbound tail to an empty list. For example: 


Wn :=[] 


This precludes the generation of any further stream elements and corresponds to 
using the wish kept to obtain more wishes. 


3.1.1 Producer-Consumers 


This is the most elementary form of communication protocol. It involves a pro- 
ducer process and one or more consumer processes. These communicate via a 
single shared channel. For example: 


producer(Wish), consumer(Wish), consumer(Wish) 


The producer sends many messages to the consumers; each consumer simply con- 
sumes the messages when they arrive. Since more than one message is to be 
exchanged, the shared channel needs to be used as a stream. Let us consider the 
effect of this design decision on the definition of each process. 

Stream Producers. Recall that previously we showed how a producer process 
could generate a single message on a shared communication channel. We then 
illustrated how streams are generated using assignments. A simple way to design 
a stream producer is to write the assignments directly into the producer process 
definition. For example: 


producer(Out) :— 
Out := [W1 |W2], % generate two wishes 
W1 := ferrari, % process first wish 
W2 := [W3|W4], % two more wishes 
W3 := ferrari. % process next wish 


The producer process corresponds to the genie in our analogy. The assignments 
of the form W :=[...|...] generate an element of the stream and those of the form 
W := ferrari send a message. The tail of the stream (W4) remains available; it could 
be used to generate further messages by adding more assignments. 

Although simple, this organization is only satisfactory when the number of 
messages to be sent is small. If the number is large, it becomes tiresome to 


3.1. Communication Protocols 59 


explicitly list all of the assignments; if an unbounded number of messages are 
required, the technique cannot be used at all. Thus, in many situations, it is 
preferable to use an alternative definition that employs iteration; from previous 
discussions, you should recognize iteration as synonymous with recursion. Here is 
a recursive formulation of a producer process that generates an unbounded number 
of messages: 


producer(Out) :— 
Out := [W1|W2], % generate two wishes 
W1 := ferrari, % process one 
producer(W2). % do it again with 2nd 


The recursive call to the producer causes the tail of the stream (W2) to be filled 
with another stream element. To understand the operation of this process, consider 
the output variable Out. Initially, it is assigned a structure containing a variable: 


[ferrari] W2] 


Subsequently, the variable W2 is filled with another stream element through re- 
cursion: 


[ferrari, ferrari] W3] 


Eventually, after some number of iterations, the structure appearing at the output 
corresponds to a sequence of messages. For example: 


[ferrari, ferrari, ferrari,...|Wn] 


Notice that there is always a variable in the tail of this structure. This variable 
is used to add more elements and corresponds to the wish used in our analogy to 
generate extra wishes. 

Chapter 2 pointed to the need for stopping conditions in recursive processes. As 
currently defined, the lack of a stopping condition causes the producer to generate 
an unbounded number of messages. To illustrate how an appropriate condition is 
added, let us define a producer that generates only a fixed number (N) of messages: 


producer(N,Out) :— % still generating? 
N>0 | 

N1 is N — 1, % one less to generate 

Out := [W1|W2], % generate two wishes 

W1 := ferrari, % process one 

producer(N1,W2). % do it again with 2nd wish 
producer(0,Out) :— % all done? 

Out := []. % close stream 


In this definition, the first rule is responsible for generating messages; the first 
process argument is a number (N) used to count messages. At each iteration the 
number is decremented (N1 is N — 1). After N iterations the count reaches zero 


60 Chapter 3. Six Basic Techniques 


and the second rule is chosen. This rule implements the stopping condition and 
closes the output stream. Given this definition an initial process: 


producer(3,Wish) 
generates the following stream of messages: 
[ferrari, ferrari, ferrari] 


Notice that the stream is closed because the tail of the stream is assigned the value 
[] at the stopping condition. 

Although the new producer behaves satisfactorily, it is somewhat inelegant. 
The source of this inelegance is the number of explicit assignments. An improved 
definition is shown below in which the stream element and message are generated 
using a single assignment: Out := [ferrari|W2]. 


producer(N,Out) :— % R1 
N>0 | % still generating? 
N1 is N — 1, % one less to generate 
Out := [ferrari] W2], % use one, get one 
producer(N1,W2). % do it again with 2nd wish 
producer(0,Out) :— Out := []. % R2, all done? closeup 


In general, producer processes have a standard form that involves both generator 
rules (e.g., R1) and termination rules (e.g., R2). The head and guard of each rule 
specify preconditions under which either an element is added to the stream or the 
stream is closed. 


Prototype Producer Process: 


producer(..., M) :— 
M := [message(.. . )|M2], 


producer(..., M2). 
producer(..., M) :— M := []. 





Stream Consumers. A consumer process receives a stream of messages 
generated by a producer. Each message is used to perform an appropriate action 
and then the consumer receives the next message. The definition of a consumer 
process can be developed through a chain of refinements similar to those illustrated 
for the producer; for brevity, only the final process definition is shown here. The 
following process receives and uses a stream of Ferrari messages; it corresponds to 
the lucky movie-goer in our analogy. 


3.1. Communication Protocols 61 


consumer([ferrari|Ms]) :— % wait for ferrari 
go_ride_ferrari, % use the input 
consumer(Ms). % consume rest 

consumer({]). % terminate 


Notice the use of matching in both rules. In the first rule, matching postpones pro- 
cess execution until both a stream element and a message arrive. When a message 
is received, an action is performed (go-_ride-ferrari). The consumer then iterates to 
consume the rest of the stream (consumer(Ms)). In the second rule, matching is 
used to detect the end of the input stream; the consumer then terminates. 

As you might expect, consumer processes also have a standard form. This 
comprises a termination rule and a number of rules that each consume a message. 


Prototype Consumer Process: 


consumer([message(...)|Ms],...) :— 


consumer(Ms.... ). 
consumer([],...). 





An Example Stream Computation. It is important to fully understand 
the producer-consumers technique since a number of stream communication pro- 
tocols are based on it. To illustrate the technique, we will consider the execution of 
Program 3.1. In this program, a generator process produces a stream of numbers 
N, N—1,..., 2, 1 and a sum process computes the sum of the numbers; the latter 
definition was presented in Section 2.6. 


generator(N,S) :— 
N>0 | S:=[N/S1], N1 is N — 1, 
generator(N1,S1). 
generator(0,S) :— S := []. 


sum(L,Sum) :— sum1(L,0,Sum). 
sum1([X|Xs],A,Sum) :— 

A1 is A + X, sum1(Xs,A1,Sum). 
sum1([],A,Sum) :— Sum := A. 


Program 3.1: Producer-Consumers Protocol 


62 Chapter 3. Six Basic Techniques 


Table 3.1 shows an example computation where the initial process pool is: 
generator(2,Stream), sum(Stream,Sum) 


As in Chapter 2, the table shows a sequence of numbered computation steps. At 
each step, a process is picked from the process pool of the previous step, the 
process action is specified and the new process pool is shown. 


Table 3.1: An Example Stream Computation 


Step Pick Result Process Pool 
0 - - generator(2,Stream), sum(Stream,Sum) 
1 2 change state generator(2,Stream), 
+ fork sum1 (Stream,0,Sum) 
2 1 change state N1 is 2—1, Stream := [2|S1], 
+ fork generator(N1,S1), sum1(Stream,0,Sum) 
3 4 suspend no change 
4,5 1,2 both terminate generator(1,S1), sum1([2|S1],0,Sum) 
6 1 change state N1 is 1—1, S1 := [1|S2], 
+ fork generator(N1,S2), sum1([2|S1],0,Sum) 
7 4 change state N1 is 1—1, S1 := [1|S2], generator(N1,S2), 
+ fork A1 is 0+2, sum1(S1,A1,Sum) 
8,9,10 1,2,4 all terminate generator(0,S2), sum1([1|S2],2,Sum) 
11 2 change state | generator(0,S2), A2 is 1+2, 
+ fork sum1(S2,A2,Sum) 
12 3 suspend no change 
13 1 change state S2 := [], A2 is 1+2, sum1(S2,A2,Sum) 
14,15 1,2 both terminate sum1([],3,Sum) 
16 1 terminate Sum := 3 
17 1 terminate < empty > with Sum = 3 


3.1. Communication Protocols 63 


The computation illustrates how messages are produced, communicated and 
consumed. In steps 5 and 9, the assignments Stream := [2|S1] and S1 := [1|S2] 
produce messages. Notice the state of the sum1 process following each of these 
steps; it contains the information produced by the generator and illustrates com- 
munication. 

Consider the repeated attempts to reduce the sum1 process in steps 3 and 12. 
In each case the process suspends because a variable in the process state is matched 
to a structure in the rule head. For example, in step 3 Stream is matched to [X|Xs]. 
Only after a process variable is bound, in steps 5 and 9, can the consumer remove 
a message from the stream. 

Step 12 shows that the consumer is unable to terminate until the producer has 
closed the stream. In step 14 the stream is closed (S2 := []). This permits the 
consuming process to finish summing the list and terminate in step 17. 

In summary, the producer-consumers protocol allows a producer process to 
send many messages to one or more consumer processes via a stream. 


Producer-Consumers: 


One-way stream communication. 





3.1.2 Incomplete Messages 


Over a period of time you begin to notice the genie changing color. 
At first, the diminishing sickly green appears relatively pleasing but 
you become concerned when the genie suddenly turns crimson! All at 
once the genie thunders: “Baksheesh! Baksheesh!” Drawing on your 
command of ancient languages you immediately surmise that some- 
thing is not quite right and retire to a safe distance. As usual, your 
“Baedeker’s Guide to the Orient” solves all: The genie wants a tip! By 
this time the genie is sufficiently angry that you dare not approach; so 
through a megaphone you arrange a simple method of payment. An 
enormous envelope is stuck onto every Ferrari. When you receive a 
Ferrari, you stuff the envelope with a huge sum of money and post it 
back to the genie. The genie eventually receives the envelope in the 
post and spends the cash. What could be simpler? 


64 Chapter 3. Six Basic Techniques 


This scheme of payment is analogous to a communication protocol called the in- 
complete message. The central idea is that the producer sends a message contain- 
ing a slot for a reply value. The slot is represented by a variable and is filled with 
a value by the consumer. Filling the slot causes the value to be communicated 
back to the producer. Eventually, the producer receives the reply message and 
performs an appropriate action. 

In the analogy, the envelope generated by the genie corresponds to the slot. 
Filling the envelope with money and posting it corresponds to generating the reply 
value. Just as the genie spends the cash, the producer performs some action in 
response to the reply. 





BAKSHEES H !! 
BAKSHEESH // 





The following process definition illustrates the actions of the producer in this 
protocol; it extends the prototype producer shown in Section 3.1.1: 


producer(N,Out) :— 


N>0 | 
N1 is N — 1, 
Out := [ferrari(Envelope)|Ms], 
spend(Envelope), 
producer(N1,Ms). 
producer(0,Out) :— 
Out := []. 


The main addition to the prototype producer is a variable that is included in 
the output message (Envelope); this corresponds to the envelope in our analogy. 
The first rule specifies that a producer forks into three processes. One process 


3.1. Communication Protocols 65 


generates a message with a reply slot (Out := [ferrari(Envelope)|Ms]), another uses 
the reply (spend) and the last produces the next message (producer). The second 
rule specifies that the producer terminates and closes the output stream after 
generating N messages. 

The consumer in the protocol is an extended version of the prototype consumer 
process shown in Section 3.1.1. The difference is that when a message arrives, the 
consumer fills in the reply slot using an assignment (Envelope := big_-bucks). This is 
analogous to placing money in the genie’s envelope and giving it to the postman. 


consumer([ferrari(Envelope)|Ms]) :— % wait for message 
Envelope := big_bucks, % send reply 
go_ride_ferrari, % use message 
consumer(Ms). % wait for more 

consumer([ ]). % terminate 


After the consumer has generated the reply value, the reply may be inspected and 
manipulated by other processes. For example, the spend process may inspect the 
reply value to ensure that payment is sufficient: 


spend(big_bucks) :— % correct price 
call_hollywood_architect. % waste money 
spend(X) :— 
X =\= big_bucks | % not correct price 
contact_lawyers. % get worried 


Data-flow synchronization delays this process until after the reply message is gen- 
erated by the consumer. This occurs because the first rule uses matching and the 
second uses a guard test. Thus, neither rule may be used until the reply message 
is available. 

In summary, the producer sends a message containing a reply slot. The con- 
sumer sends a reply to the producer by filling in the slot. Thus, the incomplete 
message protocol provides two-way interprocess communication. 


Incomplete Message: 


e Two-way stream communication. 


e Achieved by including a reply 
variable in a message. 





66 Chapter 3. Six Basic Techniques 


Using Incomplete Messages. To further illustrate the use of this proto- 
col, Program 3.2 shows a router process that distributes messages to two other 
processes. This program might be used to execute the process pool: 


router(Rs, To1, [o2), 
printer1(To1), 
printer2(To2) 


The router distributes terms representing print jobs to one of two printers. It 
consumes jobs received on an input stream Rs and forwards them via either stream 
To1 or To2 when a printer is available. 


router(Rs,T01,To2) :— % R1 
router1(Rs,ready, To1,ready, To2). 


router1([P|Rs],ready,To1,D2,To2) :— % R2 
To1 := [{P,D}|Tota], 
router1(Rs,D, fo1a,D2, To2). 
router1({P|Rs],D1,To1,ready,To2) :— % R3 
To2 := [{P,D}|To2al], 
router1(Rs,D1,101,D, To2a). 


router1([],-,101,-,1To2) :— % R4 
To1 :=[], 
To2 := []. 


Program 3.2: Router that uses Incomplete Messages. 


Initially, both printers are defined to be ready; the router process changes state 
to a router! process with additional arguments representing the initial state of the 
printers (R1). A pending print job P is passed to the first printer provided it is 
ready (R2). Alternatively, a job is passed to the second printer if it is ready (R3). 
In both cases, the job is passed as a tuple of the form {P,D}. This is an incomplete 
message; D is a new variable that the printer will assign the value ready when it 
has completed the job. The recursive calls to router1 retain the variable D. If a 
ready value is assigned to this variable by a printer, then the router may send that 
printer another job. Both output streams are closed when the input stream closes 
(R4). 

Note that rules R2 and R3 are not mutually exclusive: If both printers are 
ready, either rule could be used to reduce the process. Recall that in this case 
Strand does not specify which rule (and hence which printer) is selected. 


3.1. Communication Protocols 67 


3.1.3 Incomplete Messages with Mergers 





It turns out that there is an equally greedy blue genie sleeping in the 
bottle. When this genie awakes, you immediately recognize an oppor- 
tunity to increase Ferrari production. Unfortunately, two problems 
arise. Both genies require Baksheesh, so you must ensure that a genie 
is rewarded only when a wish has been granted. Secondly, since both 
genies compete for Baksheesh, they attempt to give you wishes at the 
same time. 

To solve these problems, you enlist the help of a friend who owns 
a garden with two entrances. On one entrance you place a green label 
and on the other, a blue label. Then you instruct each genie to drop 
off its Ferraris at the appropriate gate and to write its own color on 
the envelopes. Your friend helps by delivering one Ferrari at a time. 
When you receive a Ferrari you detach the associated envelope, stuff 
it with money and give it to the postman. The postman delivers the 
money to the genie whose color is on the envelope. 


It is possible to combine the incomplete message protocol with merger processes. 
This allows a consumer to receive messages from a number of producers. It also 
allows a consumer to send a reply without knowing the identity of the producer. 


68 Chapter 3. Six Basic Techniques 


The use of mergers solves both the problems described in the analogy. Mergers 
will be described in detail in Section 3.4; here we provide a brief introduction to 
their use. 

A merger is a process that receives messages on some number of input streams 
and places them onto a single output stream. To illustrate how these processes 
operate, suppose there exists a process merge2. This process receives two input 
streams and produces a single output stream containing all messages on any input 
stream. For example, if the first input stream contains the messages: 


[ message1(...), message2(...) ] 
and the second contains the messages: 
[ message3(...), message4(...) ] 
then the resulting output stream contains all of the input messages: 
[ message1(...), message3(...), message2(...), message4(...) ] 


The order of messages in each input stream is preserved in the output stream. The 
relative order of messages from different input streams is not defined and cannot 
be assumed. However, it is guaranteed that all input messages will eventually 
appear on the output. From this description it is clear that the merge2 process 
corresponds to the friend in the analogy. The entrances correspond to the input 
streams and the friend handing over Ferraris corresponds to the output stream of 
the merger. An initial process pool that corresponds to the analogy is 


producer(N1,01), producer(N2,02), merge2(O1,02,Out), consumer(Out) 


In this pool, there are two instances of the producer process; these correspond to 
the blue and green genies. Each producer generates a stream of messages of the 
form ferrari(Envelope). The output streams from each producer (O1 and O2) are 
merged into a single stream (Out) by the merge2 process. This output stream is 
then consumed by the consumer process. Each producer executes the program 
described in Section 3.1.2. 

Due to the action of the merger, the consumer receives a stream of indistin- 
guishable messages. For example: 


[ferrari(E1), ferrari(E2), ferrari(E3), .. .] 


However, each Envelope variable (E1, E2, E3, etc.) is created by one producer 
and is not accessible by the other. Thus, each variable is implicitly related to a 
producer; in our analogy this corresponds to the genie writing its color on the 
envelope. 


3.1. Communication Protocols 69 


Assuming that the consumer pays a standard price for all Ferraris, its definition 
remains unchanged: 


consumer([ferrari(Envelope)|Ms]) :— % wait for message 
Envelope := big_bucks, % send reply 
go-ride_ferrari, % use it 
consumer(Ms). % wait for more 

consumer([ ]). % terminate 


Assigning the value big-bucks to each Envelope variable is sufficient to reply to 
the correct producer. Thus, the reply is made without the consumer having any 
knowledge of which producer sent a message; this corresponds to the postman 
delivering only to the appropriate genie. 

In summary, many producers may send messages to a single consumer via 
mergers. The consumer need not distinguish the producer in order to reply. 


Incomplete Message + 
Mergers: 


Consumer replies to many 
producers without knowing 
their identity. 





3.1.4 Bounded Buffers 


At first you are overjoyed by the ability to produce Ferraris at will but 
soon realize there is not enough room to put them! You devise a clever 
strategy to solve this problem: You agree with the genies that they 
will try to fill the garden while you use the Ferraris. If the garden gets 
full, the genies are allowed to take a rest and drink some tea; but, if 
you remove a Ferrari and space becomes available, then a genie must 
fill it as soon as possible. 


In both the producer-consumers and incomplete-message protocols the producer 
generates messages without regard for the number consumed. Messages that are 
ge.erated but not yet consumed are stored by the Strand system. This has an 
unfortunate consequence: If a large number of messages are stored, eventually 
there may not be sufficient space. This situation is analogous to running out of 
garden space to store Ferraris. 





70 Chapter 3. Six Basic Techniques 











p S; a 
L Le © ey Keo i K ár ip Ags rt 
NU eee pate fi 0 TS’ on Win ARI 
-f nc soit =. sce om TON Wen 
ee A oe, Wee nin 
l o To 

: ee `~ Pn 


rare 


iw 


7 A y eS 


i a4 ty ARENDS a 
EEE ~ z 





Strand will inform you when there is no further space available for saving 
messages. Should this situation arise, it is possible to redesign those processes 
that produce an excessive number of messages. The technique used to achieve this 
redesign is called the bounded buffer. It regulates a producer so that the number 
of outstanding messages is bounded by the size of a message buffer; the buffer 
corresponds to the garden in the analogy. If the buffer becomes full, the producer 
waits until there is room in the buffer; this corresponds to the genie taking a rest. 
The consumer removes elements from the buffer when they are available. Thus, it 
is the consumer that creates room in the buffer and controls the rate of message 
generation. In the same way, it is the wisher that makes room in the garden 
allowing the genie to produce more Ferraris. 

Unlike the previous protocols, in the bounded buffer the consumer generates 
the stream. However, each element of the stream is simply a variable. The pro- 
ducer may use these variables to send messages to the consumer. A fixed number of 
stream elements are initially made available to the producer for sending messages; 
these form an initial message buffer: 


Buffer := [M1, M2, M3,. . . |Mn] 


The producer is now redefined to send a message only if there is space in the 
buffer; it terminates if the stream is closed: 


3.1. Communication Protocols 71 


producer([M|Ms]) :— 
M := ferrari, 
producer(Ms). 

producer([ ]). 


Notice the difference between this producer and those defined in previous sections: 
Instead of generating the stream, this producer waits for the stream to be gen- 
erated. If the producer uses all the available buffer space, matching causes it to 
suspend. This corresponds to the genie taking a tea break while the garden is full. 

The consumer process is not only responsible for consuming messages but also 
for adding elements to the end of the buffer. To achieve this, the consumer must 
have access to the stream tail. Thus, the initial process pool is: 


Buffer := [M1, M2, M3, ...|Mn], 
producer(Buffer), 
consumer(N, Buffer,Mn) 


The consumer is now redefined to wait for a message, consume it and add an 
element to the buffer; this corresponds to removing a Ferrari from the garden and 
making room for another. Here we show a consumer that requests a fixed number 
N of Ferraris. When this process terminates, it closes the stream of requests. 


consumer(N, [ferrari|Ms],Buffer) :— 
N>0 | 
N1 is N — 1, 
go-_ride_ferrari, 
Buffer := [X|Bs], 
consumer(N1,Ms,Bs). 
consumer(0,_,Buffer) :— 
Buffer := []. 


Adding an element to the buffer allows the producer to generate another message 
if the buffer was previously full; this corresponds to a genie ending a tea break 
and producing more Ferraris. 

In summary, the bounded buffer protocol places a limit on the number of 
messages that are produced but not yet consumed. This limit corresponds to the 
size of a message buffer through which communication is conducted. 


72 Chapter 3. Six Basic Techniques 


Bounded Buffers: 


e Uses a buffer to limit 
outstanding messages. 


e Consumer generates buffer. 


e Producer inserts messages. 





Using Bounded Buffers. Consider the following problem: 


A bottle factory employs two workers, Mary and John. John puts soda 
into bottles which Mary seals. John is not permitted to get more than 
four bottles ahead of Mary, as otherwise gas escapes and the soda goes 
flat. When John stops work, Mary seals any remaining bottles before 
stopping herself. 


This situation is similar to that described in the previous story. Program 3.3 shows 
that it can be represented using the bounded buffer protocol. 


producer(N,[M|Ms]) :— % R1 
N>0 | 
M := bottle, N1 is N — 1, 
producer(N1,Ms). 
producer(0,[M|_]) :— M := stop. % R2 


consumer([bottle|Bs],Buff,Os) :— % R3 
Buff := [_|Buff1], Os := [sealed_bottle|Os1], 
consumer(Bs,Buff1,Os1). 

consumer([stop|-_],_,Os) :— Os := []. % R4 


Program 3.3: Alternative Bounded Buffer Protocol 


The consumer (Mary) produces requests for bottles; the producer (John) fills 
these requests. As before, the consumer adds to the end of the buffer each time a 
message is received (R3). Note how the consumer forwards a sealed_bottle message 
on an output stream Os each time it receives a bottle message from the producer 
(R3). The main difference between Program 3.3 and the previous bounded buffer 
program is that the producer (John) terminates communication. After sending N 
messages, the producer sends a special message stop to the consumer (R2). The 
consumer process terminates upon receiving this message and closes the output 
stream (R4). This program can be executed using the process pool: 


3.2. Difference Lists 73 
Buff := [_,-,-,-|End], producer(100,Buff), consumer(Buff,End,Os) 


The initial buffer contains four variables; this allows the producer to generate 
a maximum of four unprocessed bottles before suspending. Execution of these 
processes using Program 3.3 creates a list Os containing 100 sealed_bottle messages. 


3.2 Difference Lists 


pene 


TaN 






Got “a, he ie H fui to it 
agg a Oi Vie iuh 
KS afi - 


` 


niun aN Taint 
A S nies 


Wat 


4 EAA 





Tommy and his sister Jenny are ecstatic. For Christmas they received 
a joint present: a huge box of LEGO bricks. Never having seen a 
toy of this type, they eye the brightly colored bricks with fascination. 
After long and arduous experiments they eventually discover that every 
brick has a common feature: The top has a circular stud and the 
bottom has a hole into which a stud fits. By inserting a stud into a 
hole it is possible to build a new toy! After some negotiation they 
decide to play the following game: They make a big pile from the 
bricks and furiously grab bricks to assemble them. Sometimes they 
put the assembled bricks back on the heap and sometimes they just 
grab another brick and add it to the existing assembly. Eventually, 
they end up with only two toys on the heap; they then stick these two 
together. The object of the game is to create one big toy from all the 
bricks as quickly as possible. 


A difference list is a representation of a list that resembles a big LEGO toy. The 
list is constructed from independent parts that can be assembled in any arbitrary 
order or in parallel. Assembling a difference list is analogous to the children’s game 
where bricks are assembled at random. Each part of a difference list corresponds 
to a single LEGO brick and has the form: 


[brick|T ] / T 


74 Chapter 3. Six Basic Techniques 


By convention, the operator “/” signifies a difference list; its first argument is an 
element of the list ([brick|T]) and the second argument is the variable tail of that 
element (T). The list element corresponds to the stud on a LEGO brick and the 
tail (T) corresponds to the hole in the bottom of a brick. Recall that two bricks 
are assembled by placing the stud of one in the hole of another. By analogy, two 
parts of a difference list are assembled by placing the element of one in the tail of 
another. For example, consider the following bricks: 


[brick1|T1] / T1 and [brick2|T2] / T2 
These are assembled with the assignment: 
T1 := [brick2|T2] 
This yields a segment of a difference list that contains both bricks: 
[brick1,brick2|T2] / T2 


More generally, two difference lists A/B and C/D are combined to give a differ- 
ence list A/D using the assignment B := C. 


Using difference lists: 


The tail of the list must be 
an accessible variable. 





Let us use a slightly more complex example to illustrate some other aspects of 
the representation. Consider building a LEGO toy from four bricks colored blue, 
green, red and yellow. This is simulated by representing each brick as part of a 
difference list. Thus, the initial heap of bricks is represented by the parts: 


[blue|T1] / T1 
[green|T2] / T2 
[red|T3] / T3 
[yellow|T4] / T4 


Initially, the toy to be built does not yet exist and can be represented by an empty 
difference list: 


Toy / ToyEnd 


3.2. Difference Lists 75 


The toy is assembled by executing the following set of assignments. 


Toy := [blue|T1] 
T1 := [green|T2] 


T2 := [red|T3] 
T3 := [yellow|T4] 
T4 := ToyEnd 
ToyEnd := [] 


Notice that all these assignments are independent and can be carried out in any 
order or in parallel. Each assignment incrementally builds a portion of the resulting 
structure. Regardless of the order in which the assignments are executed, when 
they are all complete the variable Toy will be bound to the list: 


[blue, green, red, yellow] 


Many processes may cooperate 
to construct a difference list 


in parallel. 





Let us briefly consider one particular order of the assignments listed above. Exe- 
cuting the second and fourth assignments generates two segments of a difference 
list: 


[blue, green|T2] / T2 and [red, yellow |T4] / T4 


Now we execute the third assignment. This has the effect of concatenating the 
two list segments and yields the following new segment of a difference list: 


[blue, green, red, yellow|T4] 


Difference lists can be 


concatenated in constant 
time. 





76 Chapter 3. Six Basic Techniques 


This example demonstrates the main concepts involved in manipulating difference 
lists but does not illustrate how to design processes that use them. We now 
illustrate this aspect of the representation by designing a program that builds the 
toy constructed in the example. First we define a process that adds a single brick 
to the toy: 


add_brick(Brick,Bb/Be) :— Bb := [Brick|Be]. 


This rule places a single brick ([Brick|...]) between two points in a difference list. 
The first point corresponds to the beginning of the brick (Bb) and the second 
corresponds to the end of the brick (Be). Notice how the end of the difference list 
(Be) is fed into the tail of the inserted element; this is analogous to placing the 
stud of the nezt brick in the hole of the current brick. Now it is easy to write a 
process that produces the toy: 


generate_toy(Tb/Te) :— 
add_brick(blue, [b/Tm1), 
add_brick(green,Tm1/Tm2), 
add_brick(red, Tm2/Tms3), 
add_brick(yellow, Tm3/Te). 


This process spawns four add_brick processes that cooperate to construct a toy. 
The toy is created between points Tb and Te, the beginning and end of a difference 
list. The blue block is placed between Tb and some intermediate point Tm1, the 
green between Tm1 and another intermediate point Tm2, etc. Notice that the final 
yellow block is placed between the point Tm3 and the end of the toy Te. The toy 
is generated using an initial process: 


generate_toy(Toy/ []) 


The empty list is threaded through the processes to eventually terminate the 
difference list. Thus, the variable Toy in the initial process is eventually bound to 
the expected structure: 


[blue, green, red, yellow] 


Another technique is valuable in using this representation: Adding elements con- 
ditionally. To illustrate the technique, consider an alternative add_brick process 
that only adds non-black bricks. Here is the modified process: 


add_brick(Brick,Tb/Te) :— 
Brick =\= black | 
Tb := [Brick|Te]. 
add_brick(black,Tb/Te) :— 
Tb := Te. 


3.2. Difference Lists 77? 


The important aspect of this process is the assignment: Tb:=Te. This signifies that 
nothing is assembled between the points Tb and Te in the difference list. Thus, 
modifying the generate_toy rule to include black bricks has no effect on the final 
result. For example: 


generate_toy(Tb/Te) :— 
add_brick(black, Tb/Tm1), 
add_brick(blue,Tm1/Tm2), 
add_brick(black, Tm2/Tms3), 
add_brick(black, Tm3/Tm4), 
add_brick(green, Tm4/Tm5), 
add_brick(red, Tm5/Tm6), 
add_brick(yellow, Tm6/Tm7), 
add_brick(black, Tm7/Tmé8), 
add_brick(black, Tm8/Te). 


It is important to understand that the operator “/” is a notational convenience 
and has no semantic significance in the program. Often it is used during pro- 
gram development to highlight difference lists but is later removed for efficiency. 
A functionally equivalent, but more efficient program to compute the above toy is 
shown in Program 3.4. 


generate_toy(Tb, Te) :— 
add_brick(black, Tb,Tm1), 
add_brick(blue, Tm1,Tm2), 
add_brick(black, Tm2,Tms), 
add_brick(black, Tm3,Tm4), 
add_brick(green, Tm4,Tm5), 
add_brick(red, Tm5,Tm6), 
add_brick(yellow, Tm6,Tm7), 
add_brick(black, Tm7,Tm8), 
add_brick(black, Tm8, Te). 


add_brick(B,Tb,Te) :— 


B =\= black | 
Tb := [B|Te ]. 
add_brick(black,Tb,Te) :— 
Tb := Te. 


Program 3.4: Building a Toy 


78 Chapter 3. Six Basic Techniques 


“7” has no significance, 


remove for efficiency. 





Using Difference Lists. To further illustrate the use of difference lists we 
consider the problem of forming the intersection L of two lists L1 and L2. The 
algorithm considers each element X of L1 in turn. If X is a member of L2, then it 
is added to the intersection; otherwise nothing is added. 


intersect(L1,L2,L) :— intersect1(L1,L2,L,[]). % R1 


intersect1([X|L1],L2,Lb,Le) :— % R2 
member_add(X,L2,Lb,Lm), 
intersect1(L1,L2,Lm,Le). 


intersect1([],-,Lb,Le) :— Lb := Le. % R3 
member_add(X,[X|_],Lb,Le) :— % R4 
Lb := [X|Le]. 
member_add(X,[X1|L2],Lb,Le) :— % R5 
X =\= X1 | member_add(X,L2,Lb,Le). 
member-add(-,[],Lb,Le) :— Lb := Le. % R6 


Program 3.5: List Intersection 


Program 3.5 solves this problem by representing the intersection as a difference 
list (Lb,Le). It recursively spawns a member_add process for each element X of 
L1 (R2). This process inserts X into the difference list if X is a member of L2 
(R4); otherwise it inserts nothing in the list (R5). The member_add process is an 
extension of the member process defined in Section 2.6. Executing Program 3.5 
with the following process causes the variable L to be assigned the value [2,3]: 


intersect([1,2,3],[3,2,4],L) 


In summary, a difference list is a representation of a list where the tail is an 
accessible variable. The list can be constructed in parallel by a set of processes. 
Difference lists can be concatenated in constant time. 


3.3. Short-Circuits 79 
3.3 Short-Circuits 


Officer Jenkins is responsible for organizing the chain gang at the state 
penitentiary. Every morning he wakes all the inmates and chains them 
together to ready them for their day’s work. Every criminal is chained 
to both his neighbors at the ankle: The left leg of one criminal is 
chained to the right leg of the next, etc. Each of the criminals is then 
allotted an amount of work at the quarry. Sometimes, during the day, 
it turns out that a job is particularly tough. When this happens, the 
chain is broken and more prisoners are linked into it at the appro- 
priate location to help with the work. At the end of the day, when 
all criminals have finished their work, Jenkins takes them back to the 
penitentiary and goes for supper. 

Unfortunately, over the years, prisons have become so overcrowded 
that verifying the criminals have all finished their work takes half the 
night. Jenkins comes up with a quick way to find out when all the 
criminals are finished. He fastens one end of the chain to one side of 
a car battery and the other end to a loud horn. The other side of the 
battery he connects directly to the horn. Then he announces that the 
criminals should stand at attention, with their ankles firmly together, 
when they have completed their jobs. This has the effect of shorting- 
out the left and right sides of each prisoner’s portion of the chain. As a 
result, when all the criminals have finished their work, the horn blasts. 


The short-circuit technique is used to detect termination of a collection of pro- 
cesses; it operates in much the same way as Jenkins’ scheme. When a network of 
processes is created, each process is spawned with two ends of a chain represented 
by variables; these correspond to the ends of a chain used to restrain a single 
prisoner. If a process forks, the chain is broken among the forked processes. For 
example: 


process(...,Left,Right) :— 
process1(...,Left,Middle1), 
process2(...,Middle1,Middle2), 
process3(... ,Middle2,Right). 


This corresponds to new prisoners being linked into the chain to help with a 
difficult job. Notice how the chains are threaded as if from the left ankle of one 
prisoner to the right ankle of another. If a process terminates, it shorts-out its 
own section of the chain by binding its variables together. For example: 


process1(...,Left,Right) :— Right := Left. 


80 Chapter 3. Six Basic Techniques 





rir 


> df 





tay 


U 


” 
= s TRU ifi 
Vee Wwe ws a) 
U ao ml 
w f UB 





SOPRA 
PP, 


A, 
_~ Bp 





-_ 


This is analogous to a prisoner finishing an allotted job: The prisoner stands at 
attention, thus shorting out a particular part of the chain. 

The initial process places a constant (done) on the left of the chain. For 
example: 


process(...,done,AllDone) 


When all processes have terminated, the right end of the chain (AllDone) will 
eventually be bound to the same constant. This signifies that all processes have 
terminated and is analogous to Jenkins’ scheme: The horn sounds (AllDone=done) 
when all prisoners are finished. 


Short-Circuit: 


e Detects global termination. 


e Propagate a constant 
through a chain; when it 
appears, termination is 
detected. 





To demonstrate the technique we will show how to detect the termination of a 
specific process. Recall the simple intersection process defined in Section 3.2. 
Program 3.6 is a modified version of the intersection program that detects ter- 
mination. The process both constructs a list L representing the intersection of 
lists L1 and L2 and assigns a variable D to the string done when intersection is 
completed. 


3.3. Short-Circuits 81 


intersect(L1,L2,L,D) :— intersect1(L1,L2,L,[],done,D). % R1 


intersect1([X|L1],L2,Lb,Le,L,R) :— % R2 
member_add(X,L2,Lb,Lm,L,M), 
intersect1(L1,L2,Lm,Le,M,R). 

intersect1([],,Lb,Le,L,R) :— % R3 
assign(Lb,Le,Done), link(Done,L,R). 


member_add(X,[X|-],Lb,Le,L,R) :— % R4 
assign(Lb,[X|Le],Done), link(Done,L,R). 

member-add(X,[X1|L2],Lb,Le,L,R) :— % R5 
X =\= X1 | member_add(X,L2,Lb,Le,L,R). 

member-_add(_,[],Lb,Le,L,R) :— % R6 


assign(Lb,Le,Done), link(Done,L,R). 
link([],L,R) :— R := L. % R7 


Program 3.6: Adding a Short-Circuit 


The first rule of the original intersect1 process definition specifies process fork- 
ing. Thus, the chain, represented by the variables L (left) and R (right), is threaded 
through processes in the rule body (R2). In order to detect the termination of 
assignments in the original program, a predefined assign process is used (R3,4,6). 
This is equivalent to an ordinary assignment (X := Y) but assigns its third argu- 
ment the value [] when the assignment is complete. A link process is used to close 
the circuit when this value is detected (R7). 

Note that the order in which the chains are closed is unimportant. The pro- 
cesses can terminate in any order or in parallel: Global termination will still be 
detected. This is also true in our analogy: The prisoners may finish their jobs 
independently; provided that the prisoners all stand at attention, the horn will 
eventually blast. 

The modified process is executed in a similar fashion to the original but includes 
an extra argument which is used to signal termination: 


intersect([a,b,c],[x,b,y],L,D) 
The variable D is bound to the constant done when the intersection is complete 
and all processes in the original program have terminated. 


3.3.1 Task Sequencing 


We have shown how the short-circuit technique can be used to detect process 
termination. This makes it possible to sequence groups of processes. Consider the 
following process pool. 


82 Chapter 3. Six Basic Techniques 


intersect1 ([a,b],[a],Lb,Lm,done,D1), 
intersect1 ([c,d],[e],Lm,[],D1,D2), 
wait_till(D2) 


Notice that a chain is threaded through all three processes; thus, all execute 
concurrently. However, let us define the wait-till process as follows: 


wait_till(done) :— big_task. 


This process waits for a done message and then executes some large collection of 
processes (big_task). The arrival of the done message indicates that all intersection 
processes have terminated. Data-flow synchronization ensures that the big_task is 
executed only after all of these processes finish execution. 


The short-circuit technique 
provides a general method for 


task sequencing. 





3.3.2 Implementing Testing Predicates 


A second important use of the short-circuit technique is to implement testing 
predicates. These are processes that indicate if some condition is satisfied: They 
return true if the condition is satisfied and false otherwise. Testing predicates 
are implemented by placing the string true on one end of a short-circuit chain. 
Along the chain, each intermediate process that verifies the condition shorts-out 
its portion of the chain as in the termination detection scheme. A process that 
determines that the condition is not satisfied inserts the string false on its right 
chain. Thus, either true is propagated throughout the chain or false is propagated 
from the rightmost unsatisfied process. 

To illustrate this idea, we will implement a testing predicate all_integers. This 
accepts a list of numbers and tests the condition: 


“all numbers in the list are integers” 
It returns true if they are all integers and false otherwise. 


all_integers([X|Xs],L,R) :— 
integer(X) | all_integers(Xs,L,R). 
all_integers([X|Xs],L,R) :— 
real(X) |R := false. 
all_integers([],L,R) :— R := L. 


3.4. Blackboards 83 


The first rule propagates the string true if the head of the input list is an integer. 
The second rule inserts the string false if the head is a real number. The final rule 
simply propagates the result at the end of the list. A typical process that returns 
a Result of true using this predicate is: 


all_integers([1,13,88, 19],true, Result) 
A process that returns a Result of false using this predicate is: 


all_integers([1.1,13,88.4,19], true, Result) 


Testing Predicates: 


e Propagate the string true. 


e Insert the string false. 





In summary, the short-circuit technique is used to detect the termination of a col- 
lection of processes. It provides a general mechanism to implement task sequencing 
and can also be used to define complex testing predicates. 


3.4 Blackboards 


A class of extroverted children is organizing a skit for the school’s 
annual parents’ evening. Mr. Williams, the teacher, announces that he 
would like everyone to come up with ideas for the show. Immediately 
there is chaos; every child has a favorite theme and they all begin 
shouting at once. 

Repeated attempts to calm the class come to no avail and eventually 
Mr. Williams leaves the class in frustration. When he returns, he is 
carrying a mailbox under his arm. In a loud voice he asks the children 
to write any idea on a piece of paper and deposit it into the mailbox. 
Mr. Williams then removes the papers one at a time and writes the 
ideas in a book. An inquisitive child who wishes to find out the contents 
of the book must place a blank piece of paper in the box; Mr. Williams 
copies the current list onto this piece of paper and hands it back. 


A blackboard is a data structure that can be both read and atomically updated by 
a collection of processes concurrently; it corresponds to the book in our analogy. 
The main difficulty in using a blackboard arises from the need to allow many 


84 Chapter 3. Six Basic Techniques 


processes to write information onto it; this corresponds to the problem of allowing 
many children to contribute ideas to a skit. 





Figure 3.1 shows the organization adopted when programming in Strand; it 
closely resembles Mr. Williams’ solution to the problem. A set of processes are 
connected via a merger to a manager process. The processes generate messages 
containing information to be entered onto the blackboard. The merger receives 
these messages and forwards them on a single stream to the manager. The merger 
is a predefined Strand process that allows many processes to write on a single 
stream. The manager process encapsulates the blackboard structure and it is the 
only process that reads or writes it. Since the manager receives a single stream, it 
may process requests to read or update the structure one at a time. Thus, updates 
are performed in a single indivisible step (atomically). The readers and writers 
correspond to the children in the analogy, the merger corresponds to the mailbox 
and the manager to Mr. Williams. 


3.4.1 Mergers 


In the analogy, children may decide to contribute at any time by placing a paper 
in the mailbox. In addition, previously active children may return to reading 
their comics at the back of the classroom. An analogous situation occurs with a 
merger: It is useful to be able to add and remove input streams dynamically. The 
predefined merger process supports this via a special message of the form merge(S). 
This is interpreted as a request to add a stream S as a new input stream to the 
merger. Subsequently, messages appearing on stream S will appear on the merger 


3.4. Blackboards 85 





Blackboard 


manager 


Input 
Stream 


reader CD reader (writer 


Figure 3.1: Blackboard Organization 


output stream. For example, let us assume that the input stream to the merge 
network is: 


[(msg1(...), merge(S), msg2(... )] 
and the stream S contains the following messages: 
[msg3(...), msg4(... )] 


Then the output of the merger will eventually contain an intermingling of both 
the messages on its input stream and the messages on the stream S. For example: 


[msg1 (...), MSg3(...), msg2(...), msg4(. ..)] 


Recall that the Strand merger guarantees that the order of messages in each 
input stream is preserved in the output stream. In addition, all messages that 
appear at an input will eventually appear at the output. The order in which 
messages from different input streams appear is not defined. 


86 Chapter 3. Six Basic Techniques 


3.4.2 Blackboard Implementation 


Using the merge network, it is a simple matter to define the blackboard organiza- 
tion illustrated in Figure 3.1. The initial process pool illustrates how the necessary 
streams are connected: 


writer(S1), writer(S2), 

reader(S3), reader(S4), 
merger([merge(S1),merge(S2),merge(S3),merge(S4)],M) 
manager(M). 


Two reader and two writer processes produce streams of requests to access infor- 
mation on the blackboard. The merger process ensures that all requests produced 
by these processes will eventually appear at input stream M to the manager. 

The manager process encapsulates the blackboard data structure and defines 
operations on it. Here we present one possible definition for the manager. Let 
us assume that Mr. Williams fills one page of the book at a time. Readers can 
request the entire contents of the book, and writers may add an idea to the end 
of the current page. The following process specifies this behavior and maintains a 
list that represents the blackboard. 


manager(M) :— manager(M,[]). 


manager([read(BB)|M],L) :— BB := L, manager(M,L). 
manager([write(E)|M],L) :— manager(M,[E|L]). 
manager([ ],-). 


The manager process services two types of request: a read request, which returns 
the current contents of the blackboard, and a write request, which adds an element 
to the blackboard. The manager updates the blackboard by recursing with a 
modified state. Subsequent requests access the new state; the update is hence an 
indivisible or atomic action. 

The readers and writers are stream producers and are defined using the stan- 
dard techniques described in Section 3.1. Note that the read request is an example 
of an incomplete message. 


Blackboards: 


Allow atomic access to data 
structures. 





3.4. Blackboards 87 


Using Blackboards. To further illustrate the use of blackboards, we consider 
the problem of maintaining a record of tasks executed in some system. This record 
is to be updated every time a task completes. Other processes may access the 
record to determine the mean task length and the length of the longest task that 
has so far completed. 

The record of tasks is a shared data structure that will be accessed and updated 
by many processes. Hence, it is naturally represented as a blackboard. The process 
structure used in the classroom example can be adapted for this problem; only the 
manager process needs to be modified. The new manager is given in Program 3.7. 


manager(Rs) :— manager(Rs,{0.0,0.0,{no_job,0}}). % R1 


manager([task(N,L)|Rs],{C,T,B}) :— % R2 
C1 is C+1, T1 is T+L, 
longest(B,N,L,B1), 
manager(Rs,{C1,T1,B1}). 


manager([mean(M)|Rs],{C,T,B}) :— % R3 
M is T/C, 
manager(Rs,{C,T,B}). 
manager([longest(B1)|Rs],{C,T,B}) :— % R4 
B1 := B, 
manager(Rs,{C,T,B}). 
manager([],-). % R5 
longest({N,L},,L1,B) :— L > L1 |B := {NL}. % R6 
longest({_,L},N1,L1,B) :— L =< L1 |B := {N1,L1}. % R7 


Program 3.7: A Job Monitor 


The manager process maintains a state representing the tasks that have com- 
pleted to date. This state consists of the total number of tasks, the total length 
of these tasks, and the longest task. The first rule initializes the state with ap- 
propriate values (R1). The manager then services three types of request: A task 
request registers a new task (R2), mean and longest requests ask for information 
(R3,4). A longest process (R6,7) is used to update the state component represent- 
ing the longest task to date (B) when a new task is received. Notice that when 
the manager state is modified, the manager recurses with the new state. Thus, 
subsequent requests are serviced with respect to the new state and updates are 
performed atomically. This program can be executed with the process pool: 


manager(Rs), Rs := [task(peter,10), task(helen,12), mean(M)] 


This assigns the value 11 to the variable M. 


88 Chapter 3. Six Basic Techniques 


3.5 Summary 


This chapter has introduced six fundamental programming techniques. Once mas- 
tered, these techniques provide a tool set with which to design Strand programs. 
All parallel programming in Strand is, in essence, a repeated use of these ideas. 

One-way communication between a producer and some number of consumers is 
achieved using the producer-consumers protocol; it is similar to a genie granting a 
set of wishes. The incomplete message protocol is based on this first technique but 
involves generating a reply from a consumer to the producer; this is analogous to 
giving the genie a tip. These two techniques are satisfactory for most programming 
needs; however, occasionally it is necessary to bound the number of messages 
that are produced but not consumed. This is achieved using the bounded buffer 
technique. 

A difference list is a representation of a list that can be constructed in parallel 
by a collection of processes; it is similar to a big LEGO brick toy. The technique 
involves maintaining the tail of a list as an accessible variable and allows lists to 
be concatenated in constant time. 

The short-circuit technique is used for detecting termination and involves 
threading a termination signal through a set of processes. This is achieved in 
much the same way as threading a chain through a set of convicts. The technique 
can be used to implement both task sequencing and complex testing predicates. 

Finally, the blackboard technique allows a set of processes to atomically access 
and update a single data structure. This is achieved using a predefined merger 
process. 


Exercises 


3.1 Design a producer process that generates 10 Ferrari messages, followed by 
5 BMW messages, followed by 15 Porsche messages. Each message includes 
the year and color of the vehicle. You may assume that all vehicles of a 
particular type are built in the same year with the same color. 


3.2 Design a process that consumes the vehicle messages generated in Prob- 
lem 3.1 and generates an output stream. The output stream should contain 
the same messages as the input stream except that the year associated with 
every vehicle is increased by one. 


3.3 Write two processes: juggler and assistant. The juggler initially sends one 
skittle to the assistant who returns it immediately. Then the juggler sends 
two skittles and the assistant sends them back in the reverse order. Finally, 
the juggler sends three skittles and gets these back in the reverse order. 
Generalize the process network to exchange any number of skittles. 


3.4 A printer and a server are connected via a common memory. The server 
places lines of information into the memory and the printer prints lines as 


3.5. Summary 89 


3.5 


3.6 


3.7 


3.8 
3.9 


3.10 


they become available. If the memory is full, the server waits until the 
printer has printed a line; if the memory is empty, the printer waits for a 
line to be placed into the memory. Design a process network that simulates 
this protocol using a bounded buffer. 


Design three processes: dog-ages, cat-ages, and monkey-ages. Each process 
receives the same stream which contains three types of message: dog(Age), 
cat(Age), and monkey(Age). Each process records the ages of all animals of 
one particular type. Use the processes to write another process, all-ages, 
that takes the same input stream but records the ages of all the dogs, then 
all the cats, then all the monkeys. Hint: Use difference lists. 


Define the following processes: 


(a) counter: Receives a stream of elements of the form {Count,Contents} 
and fills in the Count entries with increasing integers beginning at 0. 


(b) switch: This process is to be written using difference lists. It takes a 
stream of instruction messages of the form move(A,B), store(A, Value), 
load(Value,B) and generates a stream of byte-code messages of the form 
{NewVariable, {P,Q,R,S}}. The values placed in P, Q, R and S are defined 
by the following table: 


move(A,B) => 
store(A,Value) => 
load(Value,B) = P=3, Q=Value, R=B, S=0 


Use these two processes to define another process assemble/2 that takes 
a stream of instruction messages and generates a stream of numbered 
byte-code messages. 


Extend the program in Problem 3.6 to deal with the following additional 
instruction messages: label(L) and jump(X). The label message causes nothing 
to be added to the output stream but records L as the number associated with 
the next byte-code message to be generated. The jump instruction causes 
the following byte-code message to be entered into the output stream: 


jump(X) = P=4, Q=the number of label X, R=0, S=0 


Modify the solution to Problem 3.7 to signal termination. 


Implement a testing predicate tuples5 that accepts a term as input and re- 
turns true if the term contains more than five tuples and false otherwise. 


Define a router process that takes four input streams and generates four 
output streams. It should accept messages, on any input stream, of the 
form route(N,M) and route the message M onto the Nth output stream (0 < 
N < 4). The process should record, on a blackboard, the route of the last 
five messages; other processes may send a query message to the router to 
obtain this information. 





Chapter 4 


Two Ways to Solve a 
Problem 


Strand supports two basic stylistic approaches to programming in the small. In the 
first, problem solving focuses on defining a collection of processes to form a process 
structure. This strategy isolates important functional units, encapsulates each unit 
within a process and expresses their interactions. The second approach emphasizes 
the definition of data structures and associated operations. This strategy considers 
the data manipulated in a problem and focuses on organizing this data. 


Two programming styles: 


e Process structures. 


e Data structures. 





This chapter takes a single problem and develops two solutions based on these 
approaches. The aim is both to contrast the approaches and to introduce key 
elements of programming style. Both solutions are developed using stepwise re- 
finement. 


4.1 The Paving Problem 


A number of isolated villages are linked by forest paths. These paths 
turn to quagmires in the winter, making travel all but impossible. The 


91 


92 Chapter 4. Two Ways to Solve a Problem 


villagers eventually tire of traipsing through the mud and petition the 
government to provide all-weather access. After months of negotia- 
tions, the government reluctantly agrees but attempts to cut costs: 
They propose to pave only the minimum length of roads required to 
make any village reachable from any other. 

Two competing engineers are dispatched to give estimates on the 
cost of paving. Although their styles differ, both estimate the job 
using the same method. They consider each path in turn in order of 
increasing length. If a path permits a journey that could not have 
been made previously, they decide to pave that path; otherwise it is 
left unpaved. When all possible journeys can be made without muddy 
boots, they assess the cost based on the length of paths to be paved. 


gail D 
A RAS K 





Figure 4.1: The Isolated Villages 


The paving problem is well-known to computer scientists. It corresponds to the 
problem of finding a minimum-cost spanning tree for a graph; the villages in our 
story correspond to vertices in a graph and the paths to weighted edges. Figure 4.1 
shows a typical paving problem. It is recognizable as a weighted undirected graph: 
Numbers on the signposts signify weights associated with edges. 

The method employed by the engineers is Kruskal’s algorithm. This is a stan- 
dard method for computing a minimum-cost spanning tree for a graph G=(V,E), 
where V is a set of vertices and E a set of weighted edges. The algorithm maintains 


4.1. The Paving Problem 93 


a collection VS of disjoint sets of vertices; this initially contains one set for each 
element in V. Each set W in VS represents a connected set of vertices forming a 
spanning tree in the spanning forest represented by VS. Edges are chosen from E 
in order of increasing cost. Each edge (v,w) is considered in turn. If v and w are 
already in the same set in VS, the edge is discarded. If v and w are in distinct 
sets W1 and W2 (i.e., W1 and W2 are not connected) W1 and W2 are merged into 
a single set and (v,w) is added to the set of edges in the final spanning tree. The 
algorithm halts when VS contains a single set; this indicates that all vertices are 
connected. 

Let us view the paths illustrated in Figure 4.1 as a graph and apply Kruskal’s 
algorithm to compute the spanning tree. Table 4.1 illustrates the execution of the 
algorithm; at each step it shows the set VS, the edge considered and the result. If 
the result is yes, then the edge is added to the spanning tree; otherwise it is not. 


Table 4.1 Application of Kruskal’s Algorithm 


VS Edge Add? 
{Paris},{York},{Frinton},{Sydney},{Haast} Frinton-Haast yes 
{Paris} ,{York},{Frinton, Haast},{Sydney} | Paris-York yes 

{ Paris, York} ,{Frinton,Haast} ,{Sydney } Frinton-Sydney yes 

{ Paris, York},{Frinton,Haast Sydney } Sydney-Haast no 

{ Paris, York} ,{Frinton,Haast Sydney } Frinton-Paris yes 

{ Paris, York,Frinton, Haast Sydney } done 


The shortest, and hence the first edge chosen, is that linking Frinton and Haast 
(length = 2). As no edges have been chosen at this point, Haast and Frinton are 
in disjoint sets. Thus, this edge is added to the set of edges in the spanning tree 
T. The next shortest edge is Paris-York (4). As Paris and York are in disjoint sets, 
this is also added to T. Sydney-Frinton (5) is selected next, and again is added. 
The fourth edge to be considered is Sydney-Haast (7). Sydney and Haast form 
part of the same set in VS due to the edges Frinton-Haast and Sydney-Frinton. 
Thus, this edge is not added to T. Finally, the Paris-Frinton road (8) is selected 
and added. The final spanning tree is: 


4 
Paris ——» York 


l 3 
2 
Frinton ———s Haast 


5 
Sydney 


94 Chapter 4. Two Ways to Solve a Problem 


4.2 Stepwise Refinement 


Although small, the paving problem is sufficiently complex that some thought is 
required to reach a solution. To overcome the problem complexity we will employ 
stepwise refinement. Recall that this methodology develops a program through 
a sequence of refinement steps. The design process begins with an outline, or 
abstract specification, of the problem and at each step decomposes it into simpler 
subproblems. Design decisions concerning representational details are deferred for 
as long as possible. The aim of this process is to separate seemingly dependent 
aspects of the design so that they can be reasoned about independently. During 
the design process, data structures that form the interfaces between program com- 
ponents are also developed. Refinement eventually ends when each subproblem 
can be coded directly. 


Stepwise refinement: 


e Develop program via a sequence 


of refinement steps. 


e Refine program and data 
structures together. 





Each refinement step involves a number of design decisions concerning how a task 
and its data are to be implemented. These decisions are based on a variety of cri- 
teria including efficiency and clarity. Decisions made in one step may subsequently 
turn out to be incorrect, inefficient or otherwise inappropriate. The programmer 
must then backtrack and reconsider earlier decisions. 


Backtracking is an essential 


part of the program-development 
process. 





The next two sections apply stepwise refinement to develop both a process-oriented 
and a data-oriented implementation of Kruskal’s algorithm. 


4.3. A Process-Oriented Solution 95 


4.3 A Process-Oriented Solution 


D 
a A „o. ids) of 2 
æ» N vee `“ At ' ' 
iy M's , nth a z '. 1 
red v “th N $ 
a or f Lenk) atan n a j 7 al e 
Q as ‘hoe 
\ ` e Pe 7 44 
ND yr Way, Y ; 
Q AN | NAN ho a Ws 
AN teh E of] Caer i ARI 
NÑ ' ` 4 Pu, ` a ; J - ` , 
BII ee yd ` Vy 
` 
Ng 
N 
U 


h 
1 
t 
“4 


` JA 

r% 
H Ng 
a : H- 


BAANT alt 





The first engineer employs two teams of rowdy villagers to help in 
making an estimate and begins by forming the villages into groups. 
Initially, each village is placed in a group on its own. Groups are 
combined when paths are selected for paving. The first team provides 
information about the paths between villages; the second keeps track 
of which villages are in the various groups. The engineer is able to ask 
the teams questions by screaming through a megaphone; the villagers 
shout back in reply. 

To decide which paths should be paved the engineer repeatedly asks 
the first team to find the next shortest path. Then the second team 
is asked to name the group containing the villages connected by the 
path. If the groups are different, the engineer asks the second team to 
combine the groups and decides to pave that path; otherwise the path 
is not paved. When all the villages are in a single group the engineer 
heads homeward with a paving plan. 


In programming terms, the first engineer’s approach is process-based. It focuses on 
the organization and interaction of teams of villagers. The engineer and villagers 
correspond to processes and requests for information correspond to communica- 
tion. 

We will now develop an implementation of Kruskal’s algorithm in the style of 
the first engineer. The initial refinement step for this program corresponds to the 
engineer’s initial division of responsibility and is specified as: 


spanning-_tree(...) :— 
edges(...), sets(...), kruskal(...). 


96 Chapter 4. Two Ways to Solve a Problem 


The edges and sets processes correspond to the engineer’s first and second teams. 
The kruskal process applies Kruskal’s algorithm and corresponds to the engineer. 

The edges and sets processes require information about edges and vertices. 
The kruskal process must be able to request information from these processes and 
output a solution. We therefore introduce arguments to the initial specification to 
permit the processes to interact: 


spanning-tree(Vertices,Edges,Solution) :— 
edges(Edges, IoEdges), 
sets(Vertices, ToSets), 
kruskal( ToEdges, ToSets, Solution). 


The refined spanning.tree process has three arguments: The first, Vertices, is a 
list of vertex names; the second, Edges, is a list of edges in the graph. The last 
argument, Solution, is a communication channel used to output a solution. 

The specification states that the edges process is initialized with the set of 
edges in the graph. The sets process is initialized with the vertices and the kruskal 
process yields the solution. The additional arguments ToEdges and ToSets are 
communication channels used by kruskal to interact with the other processes. 


4.3.1 Refining the Kruskal Process 


Three subproblems have been identified and assigned to processes: kruskal, sets 
and edges. As the kruskal process is to apply the central algorithm, it is natural 
to develop this component of the program first. 

At each step, Kruskal’s algorithm obtains a shortest edge (V1,V2) and finds 
the names of the sets in which the vertices V1 and V2 are contained. If the two 
sets are different, then the algorithm combines them and places the edge (V1,V2) 
into the solution; otherwise the edge is discarded. An outline specification of the 
actions to be performed in a single step is: 


kruskal(...) :— 
shortest(V1,V2), 
find(V1,Set1), 
find(V2,Set2), 
kruskal1 (V1,V2,Set1 ,Set2,...). 


kruskal1(V1,V2,Set1,Set2,...) :— 
Set1 =\= Sete | 
union(Set1 ,Set2), 
add(V1,V2). 
kruskal1 (_,_,Set,Set,. . . ). 


The first rule obtains the shortest remaining edge and determines the names of 
the sets containing its vertices. A kruskal1 process is created to determine whether 
these are the same set. The first rule of kruskal1 deals with the case when the two 


4.3. A Process-Oriented Solution 97 


sets are the same: It does nothing. The second rule deals with different sets: The 
sets are combined and the edge is added to the solution. 

Recall that iteration is implemented using recursion in Strand. A program that 
performs the above actions repeatedly can be obtained by a refinement that causes 
the kruskal process to be re-executed once an edge is processed: 


kruskal(...) :— 
shortest(V1,V2), 
find(V1,Set1), 
find(V2,Set1), 
kruskal1(V1,V2,Set1,Set2,...). 


kruskal1(V1,V2,Set1,Set2,...) :— 
Seti =\= Set2 | 
union(Set1,Set2), 
add(V1,V2), 
kruskal(...). 
kruskal1 (_,_,Set,Set,...) :— 
kruskal(. . . ). 


This program implements the required iteration using two mutually recursive pro- 
cess definitions: The kruskal process changes state to kruskal1, which in turn 
changes state to kruskal. 

A further refinement is necessary to introduce a stopping condition for the 
recursion. The algorithm terminates when a single set of vertices remains. Each 
union operation reduces the number of sets by one; initially there is one set per 
vertex. Thus, the stopping condition can be implemented using a verter count 
which is decremented at each union operation. Termination occurs when the 
count reaches one: 


kruskal(Count,...) :— 
Count > 1 | 
shortest(V1,V2), 
find(V1,Set1), 
find(V2,Set1), 
kruskal1 (Count, V1,V2,Set1 ,Set2,. . . ). 
kruskal(1,...). 


kruskal1 (Count,V1,V2,Set1,Set2,...) :— 
Set1 =\= Set2 | 
union(Set1 ,Set2), 
add(V1,V2), 
Count1 is Count — 1, 
kruskal(Count1,.. .). 
kruskal1(Count,-_,-,Set,Set,...) :— 
kruskal(Count,. . . ). 


98 Chapter 4. Two Ways to Solve a Problem 


The new second rule causes the kruskal process to terminate when the Count ar- 
gument is 1; a guard test Count > 1 in the first rule ensures that the second rule 
is selected in this case. The last rule decrements the count each time a union 
operation is performed and passes the decremented count to the recursive kruskal 
process. 

The specification of Kruskal’s algorithm identifies the subproblems add, short- 
est, find and union. It is now necessary to refine these further. The first subtask, 
add, simply adds an edge to the solution. The other subtasks implement oper- 
ations on sets and edges. These latter operations are the responsibility of other 
processes and are therefore implemented as requests to these processes. The fol- 
lowing requests are required: 


“Sets: In what set is vertex X?” 
“Sets: Combine sets named S1 and S2.” 
“Edges: Return the shortest edge.” 


These requests can be represented in Strand by messages: 


find(Vertex?,SetT) 
union(Set1 ?,Set2?) 
shortest(Village1 fT, Village2T) 


The annotation ? is used here to indicate values provided by kruskal; the anno- 
tation Î indicates a variable. The find and shortest requests are thus incomplete 
messages: They contain variables that the other processes will use to return values. 

Refinement of the kruskal process can now be completed. Additional arguments 
are provided to represent communication channels to the edges and sets processes; 
recall that these were introduced in the initial specification. These channels are 
used to forward find, union and shortest messages: 


kruskal(Count, ToEdges, ToSets,Solution) :— 
Count > 1 | 
ToEdges := [shortest(V1,V2)|ToEdges1], 
ToSets := [find(V1,Set1),find(V2,Set2)|ToSets1], 
kruskal1 (Count, ToEdges1 , ToSets1 ,Solution, V1,V2,Set1 ,Set2). 
kruskal(1, ToEdges, ToSets,Solution) :— 
Solution := [], ToEdges := [], ToSets := []. 


kruskal1 (Count, ToEdges, ToSets, Solution, V1,V2,Set1,Set2) :- 
Set1 =\= Set2 | 
ToSets := [union(V1,V2)|ToSets1], 
Count1 is Count — 1, 
Solution := [{V1,V2}|Solution1], 
kruskal(Count1 , foEdges, ToSets1 ,Solution1). 
kruskal1 (Count, ToEdges, ToSets,Solution,_,_,Set,Set) :— 
kruskal(Count, ToEdges, ToSets, Solution). 


4.3. A Process-Oriented Solution 99 


Note how the abstract operations in the outline have been replaced by message 
sending. For example, the shortest(V1,V2) operation is represented by an assign- 
ment that forwards a message to the edges process: 


ToEdges := [shortest(V1,V2)|ToEdges1] 


In the story, the first engineer solves the problem by repeatedly requesting a 
shortest path (edge) from the first team. In the program, this is achieved using 
the shortest message which retrieves the edge between two vertices (villages). The 
engineer then queries the second team to determine the group (set) in which these 
villages are located. In the program, this is achieved using two find messages. 
Finally, the engineer may request the second team to combine the two groups; in 
the program, this is achieved using the union message. 


4.3.2 Managing Sets 


Our initial specification stated merely that the sets process should manage infor- 
mation about sets. The role of this process has been clarified while refining the 
kruskal process: It must service find and union requests. We thus require a view 
of sets that permits a sequence of these operations to be performed efficiently. 
Consider the following set of sets: 


{ {a,b,c}, {d.e,f.g}, {h} } 


We will represent this as a forest in which leaves represent set members and the 
root of each tree specifies a set name. Edges are directed from the leaves to the 
root and sets are named using numbers (1, 2, 3). 


RAT 


A union operation combines two sets to form a single set. This is achieved by 
making one tree a subtree of the other (an optimal solution would make the smaller 
tree the subtree). For example, combining sets 1 and 2 yields the following forest. 


^] 


100 Chapter 4. Two Ways to Solve a Problem 


A find operation determines the name of the set in which an element resides. This 
is achieved by following tree branches from the element to a root. During a find 
operation, the tree is compacted to avoid unnecessary intermediate nodes. For 
example, execution of the operation find(a,S) yields S=2 and the following new 
forest. 


pe) 
a. 
@ 
=~ 
© 


b C 


It is clear from this discussion that a sequence of find and union requests can 
be processed using just two operations: Merging two trees and following a path 
from a leaf to a root. A process structure that provides these capabilities can be 
implemented as follows. Each set name and element is encapsulated in a process; 
each branch in the tree is implemented by a stream between two processes. We 
now consider the implementation of the find and union operations. 

Find. The find operation is implemented by message sending. An incomplete 
message find(Set) is sent to the leaf corresponding to the element for which the Set is 
required. Any intermediate process receiving this message forwards it. Eventually, 
it is received by a root process which assigns a value to Set. 

This solution requires the ability to send a message to a named leaf node. 
We therefore introduce a distributor process that records streams to leaf processes. 
Sending a message {Name,Request} to the distributor results in a Request being 
forwarded to the leaf process Name. For example, a message {a,find(Set)} received 
by the distributor causes it to forward the message find(Set) to process a. 


Union. The union operation is also implemented by message sending. Recall 
that this operation combines two trees. The basic idea is to make the root pro- 
cess of one tree forward messages to the root process of the other. To illustrate 
the messages used to achieve this organization, we describe the messages sent to 
achieve the union shown above. Initially, the two process structures to be merged 
are: 


a b c d e f g 


The sets containing a and d are combined by sending the messages {d,merge(S)} 
and {a,union(S)} to the distributor. Messages are then forwarded to the leaves: 


4.3. A Process-Oriented Solution 101 


a b c d e f g 
union(S) i i merge(S) 
from distributor 


These messages are forwarded by the leaf processes to their respective root pro- 
cesses: 


union(S) merge(S) 
a b c d e f g 


A root process that receives a merge(S) message creates a new input stream S: 


union(S) 
a b c S d e f g 


A root process that receives a union(S) message changes state to a process that 
forwards messages, received on its input streams, on S: 





Observe that subsequent find messages sent to leaf processes in Set 1 will be for- 
warded to the root of Set 2; in effect, all elements of Set 1 have become members 
of Set 2. 


Notice that the leaf processes perform no useful function; they simply route 
messages to root processes. We can in fact remove them; the distributor then 
forwards messages directly to the appropriate root process. In effect, leaf nodes 
are represented by entries in the distributor. 


102 Chapter 4. Two Ways to Solve a Problem 


4.3.3 The Sets Process 


We now present a Strand implementation of the process structure defined in the 
Section 4.3.2. Recall that each vertex is initially in a set on its own. As in the 
diagrams given previously, we name these sets 1,...,N. The following program 
creates a process structure representing the initial graph: 


sets(Vs,ToVs) :— 
create_sets(Vs,1,Ss), 
distributor(ToVs,Ss). 


create_sets([V|Vs],N,Ss) :— 
N1 is N + 1, 
Ss := [{V,ToSet}|Ss1], 
set(N, ToSet), 
create_sets(Vs,N1,Ss1). 
create_sets([],_,Ss) :— Ss := []. 


The sets process is decomposed into two processes: create_sets and distributor. The 
first generates the initial forest; this consists of one set process for each vertex in the 
initial graph. It also generates a list of pairs with the form {VertexName,SetStream}. 
This list specifies the initial correspondence between vertices and sets. The distrib- 
utor process is responsible for routing messages to set processes. Its operation has 
been described previously and its implementation is left as an exercise (Exercise 
4,3). 

A set process can receive find, union and merge requests. It responds to a find 
request with its name. It processes a union request ensuring that subsequent mes- 
sages are forwarded on a supplied stream. It handles merge requests by creating 
a new input stream. 

Repeated union requests can result in many input streams to the same set pro- 
cess; this indicates that merging is required. We therefore associate a predefined 
merger process with each set: 


set(Name,ToSet) :— merger(ToSet, ToSet1), set1(Name, ToSet1). 


Recall from Chapter 3 that a merger creates a new input stream S when it receives 
a message of the form merge(S). This is precisely the behavior required in response 
to a merge message. Thus, the set! process only needs to service find and union 
requests: 


set1(Name,|find(Name1)|In]) :— 

Name1 := Name, set1(Name,!n). 
set1(_,[union(Other)|In]) :— Other := In. 
set1(_,[]). 


The first rule services a find request by returning the set Name. A find request is an 
incomplete message. Recall from Section 3.1.2 that assigning a value to a variable 


4.3. A Process-Oriented Solution 103 


in an incomplete message has the effect of routing a reply to the sender of that 
message. This program exemplifies the combination of mergers and incomplete 
messages described in Section 3.1.3; a seti process can reply to a find request 
without knowing the identity of the sending process. The second rule processes 
a union request that contains a stream to another set process. Any messages 
arriving on the set input stream are forwarded to the other process. The last rule 
terminates the set process when its input stream is closed. 


4.3.4 Backtracking 


Recall that the kruskal process was defined to generate requests in the form 
find(Vertex,Set) and union(Vertex1,Vertex2). However, the distributor has subse- 
quently been defined to accept requests of the form {Vertex,Message}. Further- 
more, the set processes have been defined to accept the messages union(Stream) 
and merge(Stream) rather than the single message union(Vertex1,Vertex2). Hence, 
it is necessary to backtrack in the design process and redefine kruskal to generate 
the appropriate requests. 

The program developed in previous refinements does not guarantee the order 
in which messages arrive at set processes. In consequence, it is possible for a find 
message to arrive at a set before a union message generated in a previous iteration 
of Kruskal’s algorithm. This would result in the return of the wrong set name. 
We solve this problem by causing the set process to acknowledge the receipt of 
a union message. In addition, we cause the kruskal process to delay generation 
of further messages until an acknowledgment is received. Minor changes to the 
program implement this solution. A variable D is added to the union message, 
which now has the form union(S,D). The set process is modified to assign a value 
to this variable when it receives the message. The kruskal process delays generating 
further messages until this variable is assigned a value. 

This final refinement to our program is representative of a common Strand 
program-development technique: First define the processes in a system and their 
interactions without concern for the sequencing of activities; then specialize the 
program to synchronize process interactions where required. 


e First specify process actions. 


e Then specialize the program 
to synchronize activities. 





104 Chapter 4. Two Ways to Solve a Problem 


4.3.5 The Process-Oriented Program 


Program 4.1 is the final process-oriented solution to the paving problem. It defines 
a set of processes and specifies how these processes interact to implement Kruskal’s 
algorithm. 


spanning_tree(Vs,Cnt,Es,Soln) :— 
edges(Es, ToEs), sets(Vs, ToVs), kruskal(Cnt, ToEs, ToVs,Soln,done). 


kruskal(Cnt,Es,Vs,Soln,done) :— 
Cnt > 1 | 
Es := [shortest(V1,V2)|Es1], 
Vs := [{V1,find(S1)},{V2,find(S2) }|Vs1], 
kruskal1 (Cnt,Es1,Vs1,Soln,V1,V2,S1,S2). 
kruskal(1,Es,Vs,Soln,_) :— Soln := [], Es :=[], Vs := []. 


kruskal1(Cnt,Es,Vs,Soln,V1,V2,$1,S2) :— 
S1 =\= S2 | 
Vs := [{V1,merge(S) },{V2,union(S,D)}|Vs1], 
Soln := [{V1,V2}|Soln1], Cnt1 is Cnt — 1, 
kruskal(Cnt1,Es,Vs1,Soln1,D). 
kruskal1(Cnt,Es,Vs,Soln,_,_,S,S) :— 
kruskal(Cnt,Es,Vs,Soln,done). 


sets(Vs, ToVs) :— create_sets(Vs,1,Ss), distributor(ToVs,Ss). 


create_sets([V|Vs],N,Ss) :— 
Ss := [{V,ToSet}|Ss1], N1 is N + 1, 
set(N, ToSet), create_sets(Vs,N1,Ss1). 
create_sets([],_,Ss) :— Ss := []. 


set(Name,ToSet) :— merger(ToSet, ToSet1), set1 (Name, ToSet1). 


set1(Name,|find(Name1)|In]) :— Name1 := Name, set1(Name,|n). 
set1(_,[union(Other,D)|In]) :— Other := In, D := done. 
sett (_,[]). 


Program 4.1: A Process-Oriented Spanning Tree Program 


Program 4.1 uses both the producer-consumers and incomplete message tech- 
niques presented in Chapter 3. Various processes communicate via streams; the 
kruskal process obtains information from the edges and sets processes using in- 
complete messages. The program is not complete; the missing distributor and edges 
processes are left as exercises. 


4.4. A Data Structure Solution 105 


4.4 A Data Structure Solution 





The second engineer has a different style and employs a team of diligent 
clerks rather than a crowd of rowdy villagers. The engineer begins 
estimation by drawing up two documents and a set of administrative 
procedures. One of the documents contains villages in its first column 
and numbers in its second; initially, each village is associated with a 
unique number. The second document contains a list of paths. 

There are three procedures that the engineer may ask a clerk to 
perform. The first determines the next shortest path recorded in the 
path document. The second ascertains the number associated with a 
village; this involves scanning through the village document. The last 
procedure is applied to the same document and changes all occurrences 
of one number; this involves making a completely new copy. 

The engineer repeatedly asks a clerk to find the shortest remaining 
path. The path specifies two villages and is obtained using the first 
procedure. The engineer then asks clerks to find the number associated 
with these villages. This is achieved using the second procedure. If the 
numbers are not the same, the engineer asks a clerk to apply the third 
procedure and paves this path. When the village document contains 
only one number the engineer returns home with a paving plan. 


The approach used by the second engineer is data-structure oriented. The prin- 
cipal concern is how to represent and manipulate data. The documents in the 
story represent data structures; the administrative procedures applied by clerks 
represent operations on these structures. The engineer applies Kruskal’s algo- 
rithm as before, but coordinates data-structure modifications rather than process 
interactions. Thus, an initial refinement of the problem is: 


106 Chapter 4. Two Ways to Solve a Problem 


spanning-_tree(Vs,Cnt,Es,Soln) :— 
build_edges_structure(Es,Edges), 
build_set_structure(Vs,Sets), 
kruskal(Cnt,Edges,Sets,Soln). 


The structure of this initial specification is similar to that developed for the first 
engineer. However, the variables Edges and Sets now represent data structures 
(documents) rather than streams. 

As the second engineer also uses Kruskal’s algorithm, an abstract specification 
for kruskal can be developed in much the same way as that developed in Section 4.3. 
The development is not pursued in detail; only the end result is shown: 


kruskal(Cnt,Es,Sets,Soln) :— 
Cnt > 1 | 
shortest(Es,V1,V2,Es1), 
find(V1,Sets,Set1), 
find(V2,Sets,Set2), 
kruskal1 (Cnt,Es1,Sets,Soln,{V1,V2},Set1 ,Set2). 
kruskal(1,_,_,Soln) :— 
Soin := []. 


kruskal1 (Cnt,Es,Sets,Soln,Edge,Set1,Set2) :— 
Set! =\= Set2 | 
union(Set1 ,Set2,Sets,Sets1), 
Cnt1 is Cnt — 1, 
Solin := [Edge|Soln1], 
kruskal(Cnt1,Es,Sets1,Soln1). 
kruskal1(Cnt,Es,Sets,Soln,_,Set,Set) :— 
kruskal(Cnt,Es,Sets,Soln). 


Compare this program with that developed in Section 4.3. The differences arise 
because the find, union and shortest operations are viewed here as operations on 
data-structures; in Program 4.1, they were viewed as messages to processes. The 
refinement of the program design has introduced two data structures, sets (Sets) 
and edges (Es), and three associated operations: (union, find, and shortest). We 
now refine the design of the sets structure and the union and find operations. 


4.4.1 The Sets Data Structure 


The sets data structure contains information about vertex sets and must support 
find and union operations. These operations can be specified as follows: 


4.4. A Data Structure Solution 107 


find(Vertex,Sets,Set). Determines the Set containing Vertex in the data 
structure Sets. 


union(Set1,Set2,Sets,NewSets). When applied to a Sets data structure, 
returns as NewSets the modified structure in which Set1 and Set2 are 
combined. 


When designing data structures and related operations it is necessary to be aware 
of some constraints imposed by Strand’s computational model. Only two types of 
structured data are provided: lists and tuples. In addition, the single-assignment 
rule prevents the modification of data structures. In consequence, copying must 
be used to achieve the effect of a modification. Strand data structures are often 
designed to minimize the amount of copying required to perform an update. In re- 
fining the program further we will first utilize a simple representation and shall not 
be concerned with the amount of copying performed; later an improved solution 
will be developed that reduces copying. 

Lists are the simplest and most flexible data structure. A set of vertex sets 
can be represented as a list of vertex-set pairs each with the form {Vertex,Set}. 
Each entry in the list indicates the set to which a particular vertex belongs. This 
structure corresponds to the villages document in the story. The following list 
represents an initial villages document: 


[{paris,1},{york,2},{haast,3},{frinton,4},{sydney,5}] 


The list data structure is attractive because it is easy to write programs that 
recurse over all elements. For example, the find operation can be implemented by 
a process that examines each list element in turn until a named item is encountered: 


find(Name,[{Name1,-}|Rest],Set1) :— 
Name =\= Name1 | 
find(Name,Rest,Set1). 
find(Name,[{Name,Set}|_],Set1) :— 
Seti := Set. 


An equally succinct specification can be provided for the union operation: 


union(Set1 ,Set2,[{Name,Set1 }|Rest],Sets) :— 
Sets := [{Name,Set2}|Sets1], 
union(Set1 ,Set2, Rest,Sets1). 
union(Set1 ,Set2,[{ Name,Set}|Rest],Sets) :— 
Set =\= Set1 | 
Sets := [{Name,Set}|Sets1], 
union(Set1 ,Set2,Rest,Sets1). 
union(_,_,[],Sets) :— Sets := []. 


108 Chapter 4. Two Ways to Solve a Problem 


This process recurses along the list of sets and constructs a new list in which one 
set name is replaced by another. For example, consider the process: 


union(2,3,Sets, NewSets) 
If Sets has the value: 
[{paris,1},{york,2},{haast,3},{frinton,4},{sydney,5 }] 
then evaluation of this process computes the result: 
NewSets = [{paris,1},{york,3}, {haast,3}, {frinton,4},{sydney,5 }] 


Finally, we provide a program to initialize the sets data structure given a list of 
vertices: 


build_set_structure(Vertices,Sets) :— 
sets(Vertices, 1,Sets). 


sets([Vertex|Vs],Cnt,Sets) :— 
Sets := [{Vertex,Cnt}|Sets 1], 
Cnt1 is Cnt + 1, 
sets(Vs,Cnt1,Sets1). 
sets([],_,Sets) :— 
Sets := []. 


4.4.2 Program using Data Structures 


The program developed for the data-structure solution to the paving problem 
is presented in Program 4.2. This program lacks only the process definition for 
build_edges-structure; this is left as an exercise (Exercise 4.7). 


4.4.3 Improving the Data Structure Solution 


Program 4.2 is simple but inefficient. The source of the inefficiency is the list 
structure used to represent the set of vertex sets. Unfortunately, both the find and 
union operations can only be implemented by scanning through this list. Further- 
more, the union operation must copy the entire list. Both these operations take 
time linear in the number of vertices; hence, they are expensive when the number 
of vertices is large. In this section, we explore an alternative representation for 
this data structure based on trees. This reduces the cost of both scanning and 


copying. 


4.4. A Data Structure Solution 109 


spanning_tree(Vs,Cnt,Es,Soln) :— 
build_edges_structure(Es,Edges), 
build_set_structure(Vs,Sets), 
kruskal(Cnt,Edges,Sets,Soln). 


kruskal(Cnt,Es,Sets,Soln) :— 
Cnt > 1 | 
shortest(Es,V1,V2,Es1), 
find(V1,Sets,Set1), find(V2,Sets,Set2), 
kruskal1(Cnt,Es1,Sets,Soin,{V1,V2},Set1,Set2). 
kruskal(1,_,_,Soln) :— Soln := []. 


kruskal1(Cnt,Es,Sets,Soln,Edge,Set1,Set2) :— 
Set1 =\= Sete | 
Cnt1 is Cnt — 1, Soln := [Edge|Soln1], 
union(Set1 ,Set2,Sets,Sets1), 
kruskal(Cnt1,Es,Sets1,Soln1). 
kruskal1(Cnt,Es,Sets,Soln,_,Set,Set) :— 
kruskal(Cnt,Es,Sets,Soln). 


find(Name,[{Name1,_}|Rest],Set1) :— 
Name =\= Name? | find(Name,Rest,Set1). 
find(Name,[{Name,Set}|_],Set1) :— Set1 := Set. 


union(Set1 ,Set2,[{Name,Set1 }|Rest],Sets) :— 
Sets := [{Name,Set2}|Sets1], union(Set1 ,Set2,Rest,Sets1). 
union(Set1 ,Set2,[{ Name,Set}|Rest],Sets) :— 
Set =\= Seti | 
Sets := [{Name,Set}|Sets1], union(Set1,Set2,Rest,Sets1). 
union(_,_,[],Sets) :— Sets := []. 


build_set_structure(Vertices,Sets) :— sets(Vertices,1,Sets). 
sets([Vertex|Vs],Cnt,Sets) :— 
Sets := [{Vertex,Cnt}|Sets1], Cnt1 is Cnt + 1, 


sets(Vs,Cnt1,Sets1). 
sets([],_,Sets) :— Sets := []. 


Program 4.2: A Data-Oriented Spanning-Tree Program 


110 Chapter 4. Two Ways to Solve a Problem 


Avoiding Scanning. A simple mechanism can be used to avoid scanning 
when sets are combined. An index is created to record the contents of each set. A 
set of sets is now represented by two structures. The first, Vertices, maps vertices 
to sets and the second, Index, maps sets to vertices. For example: 





Given these data structures, two sets A and B can be combined using the following 
algorithm: 


1. Consult the Index to determine the contents of B. 


2. Change the Vertices entry for each vertex in set B to contain the name of set 
A. 


3. Add the members of set B to the Index entry for set A. 


For example, given the vertices and index above, the following structures result 
when sets 2 and 3 are combined: 


a j e 





Notice that the third step of the union algorithm requires that the index entries 
for two sets be combined. If index entries are represented as difference lists, this 
operation can be achieved in constant time. 

Both the Vertices and Index structures consist of sets of pairs. Each pair can be 
represented using a tuple of the form {Name,Value}. Although we do not present 
an implementation of the the union algorithm here, we note that it can be expressed 
in terms of two operations on sets: find-value and update-value. The first retrieves 
the Value associated with a given Name; the second generates a new structure in 
which a different Value is associated with a Name. 

Avoiding Copying. Recall that a representation of sets as lists requires that 
an entire list be copied to update a single element. An alternative representation 
as an ordered binary tree avoids much of this overhead. Here is an example of such 
a structure: 


4.4. A Data Structure Solution 111 


{haast, 1} 


< N 


{frinton,2} {sydney,3} 


\ 


{paris,4} {york,5} 


Each node in this tree contains an element of the set in question plus a left and 
right subtree. Values in the left subtree are always less than the value at the 
current node; values in the right subtree are greater. It is hence possible to find 
a value in a tree by starting at the root and repeatedly executing the following 
actions. 


e Terminate if the current node has the desired value. 
e Move to the left node if the required value is “less” than the current value. 


e Move to the right node if the required value is “greater” than the current 
value. 


“Less” and “greater” correspond to some ordering on names; for example, alpha- 
betic. Thus, in the tree given previously a search for sydney proceeds by first 
examining the root node (haast); as sydney is greater than haast alphabetically, 
the right node (sydney) is examined. The search then terminates. 

A binary tree can be represented in Strand using terms of the form: 


{Name, Value, LeftTree, RightTree } 


The third and fourth components of this term are either trees or empty lists, which 
signify leaf nodes. Thus, the example tree shown earlier is represented as: 


{haast,1, 
{frinton,2,[],[]}, 
{sydney,3, 

{paris,4,[],[]}, 
{york,5,[],[]} 


Given this representation, the find_value operation can now be defined to perform 
the necessary search: 


112 Chapter 4. Two Ways to Solve a Problem 


find_value(Name, {Name, Value1,_,._}, Value) :— 
Value := Value1. 
find_value(Name,{Name1,_,L,_},Value) :— 
Name @< Name1 | find_value(Name,L, Value). 
find_value(Name,{Name1,-_,_,R},Value) :— 
Name >@ Name? | find_value(Name,R, Value). 


The first rule locates and returns a named item. The second and third rules 
continue the search in the left and right subtrees, respectively. Strand’s predefined 
lexical ordering primitives >@ and @< are used to define an ordering on names. 
(These define a total ordering over all terms; see Appendix 1). 

The update_value operation can be defined in a similar manner. In addition 
to searching the tree, it also creates new copies of tree nodes visited during the 
search and modifies the tree at the appropriate node. For example, the result of 
updating the value of the node named sydney from 3 to 2 is illustrated in the 
following diagram. The drawing on the left shows the original tree; that on the 
right shows only those nodes that are copied. 


{haast, 1} {haast, 1} 


AON AON 


{frinton,2} {sydney,3} {sydney,2} 


/ N / N 


{paris,4} {york,5} 


A definition for the update_value operation can be derived by specializing the 
find-value definition. One additional argument is added to represent the new tree: 


update_value(Nm,Val,{Nm,_,L,R},NT) :— 
NT := {Nm,Val,L,R}. 
update_value(Nm,Val,{Nm1,V,L,R},NT) :— 
Nm @< Nm1 | 
NT:= {Nm1,V,L1,R}, 
update_value(Nm, Val,L,L1). 
update_value(Nm,Val,{Nm1,V,L,R},NT) :— 
Nm >@ Nm1 | 
NT := {Nm1,V,L,R1}, 
update_value(Nm, Val,R,R1). 


This completes the definition of the new data structure and operations. Locating 
a value in a balanced binary tree of N nodes requires that at most O(logN) nodes 
be visited. Furthermore, a value in a tree can be modified by copying the same 
number of nodes. This is a considerable improvement over the list representation, 
which required on average O(N) operations to both find and update a value. 


4.5. Concluding Remarks 113 


We conclude this discussion with some general statements concerning data 
structures. Strand provides two structured data types: lists and tuples. These can 
be used to develop a variety of tree structures. The following principles can be 
used to decide which structure to use in a particular situation. 


Data Structure Design: 


e Lists are easier to manipulate 
than tuples or trees. 

e It is easy to iterate over all 
elements of a list. 

e Tuples provide “random access” 
to data. 

e Tuples are more efficient in space 
than lists or trees. 

e Trees permit efficient access 
and update. 





4.5 Concluding Remarks 


This chapter has developed two solutions to the problem of finding a minimum-cost 
spanning tree. Both solutions use Kruskal’s algorithm but their implementation 
styles are very different. The first, process-oriented solution, is based on the speci- 
fication of process structures. It was developed by isolating important components 
of the algorithm, encapsulating these in processes and specifying the process inter- 
actions. The second, data-oriented solution, is based on the specification of data 
structures and associated operations. Its development emphasized the design of 
data structures to represent the important problem components. These two pro- 
gramming styles are complimentary; although this chapter has emphasized their 
differences, both are generally employed to solve a single problem. 


Process-Oriented Style: 
Refinement identifies processes 
and their interactions. 


Data-Oriented Style: 
Refinement identifies data 
structures and associated 
operations. 





114 


Chapter 4. Two Ways to Solve a Problem 


Data structures often provide a more efficient representation than processes; al- 
though lightweight, processes are more substantial than a single list or tuple cell. 
However, Strand’s single-assignment rule means that data structures can only be 
updated by copying. In contrast, processes provide good encapsulation properties 
and can be used to implement mutable structures. 


Exercises 


4.1 


4.2 


4.3 


4.4 


4.5 


4.6 


4.7 


Program 4.1 does not compact the tree used to represent a set, as illustrated 
in Section 4.3.2. In consequence, repeated union operations can result in 
requests being forwarded along chains of processes. Modify the program to 
perform compaction. 


Set union is a source of overhead in Program 4.1, despite the optimization 
presented in Exercise 4.1. Improve the program by ensuring that the smaller 
of the two sets being combined becomes the subtree. 


Implement the distributor process required in Program 4.1. It should forward 
messages on any one of N named streams in O(logN) time. 


Write a program to generate a balanced binary tree from an unordered list 
of numbers. Also provide processes to delete and add elements in the tree 
while keeping it balanced. 


A balanced binary tree of N elements permits both access to and modification 
of a named element in O(logN) time. However, the tree structure represents 
a significant space overhead. Space utilization can be reduced by storing 
several elements at each node. Write procedures to create, access and modify 
a tree with five elements stored at each node. 


A tuple can be used to provide random access to data if keys are small 
integers. When keys are not small integers, a hash function applied to a key 
can be used to locate items in a tuple. Each tuple argument then contains a 
list of items that hash to the corresponding key. Write procedures to create, 
access and modify a hash table represented as a tuple. 


An efficient way of structuring the edges component of a spanning tree 
program is as a heap. A heap can be represented as a binary tree in which 
the value stored at each non-leaf node is less than the value stored at either 
offspring node. A heap is created by heapifying each subtree, in a bottom- 
up fashion. A tree is heapified by comparing the value at the root with the 
values at the left and right offspring. If either is less in value, the root is 
exchanged with the smaller and that subtree is heapified. A sequence of 
least elements can be obtained from a heap by repeatedly selecting the root 


4.5. Concluding Remarks 115 


element, replacing it with a large value, and heapifying the tree. Provide 
implementations of a heap using both process and data structures. 





ae 


i 


Oy 


Chapter 5 


Programming Problems 


Proficiency in programming is often a question of recognizing familiar problems 
in unfamiliar guises and applying well-understood solutions. This chapter shows 
how to deal with a number of traditional programming problems in Strand. It 
also illustrates the application of the ideas presented in previous chapters to more 
substantial programs. The chapter deals with the following problem areas: 


Mutual Exclusion: Providing mutual exclusion and condition syn- 
chronization when many processes access a shared resource. 


Databases: Implementing atomic transactions on distributed databases. 


Scheduling: Allocating tasks for execution in the presence of depen- 
dencies between tasks. 


Search: Traversing problem structures to locate solutions using a va- 
riety of search strategies. 


Resource Management: Avoiding deadlock and starvation when 
processes compete for shared resources. 


Each section in this chapter begins with a short story that typifies a problem in 
one of the above areas. A solution based on the story is outlined and a Strand 
implementation is presented. Finally, the relationship between this solution and 
the more general problem class is made clear. In the interests of brevity, we do 
not show all of the refinement steps performed when developing the programs. 


5.1 The Family Pastimes Problem 


Mr. and Mrs. Jones are passionate billiards players but their children 
prefer to watch television. The peace of this otherwise happy household 
is threatened by two problems: The children fight over which channel 
to watch and the blare of the television infuriates the parents. 


11? 


118 Chapter 5. Programming Problems 


In frustration, the parents insist on the following arrangement: One 
child is given a hat bearing the words “TV monitor” and is placed in 
charge. Only this child is allowed to touch the controls: The others 
must ask for channel changes by shouting requests such as “Channel 
7!” The parents also require that the children only watch television 
while they are playing billiards. The monitor is told to ensure that the 
television is on only during a game. The parents cry “Started playing” 
and “Stopped playing” to inform the monitor of their progress. 





The TV monitor is the central component of the parents’ solution to their prob- 
lems. It can be represented as a perpetual process that encapsulates the state 
of the television. Processes representing the parents and other children may then 
send messages to the monitor to request state changes. These messages correspond 
to the cries “Channel 7!”, etc. Processes representing children send messages of 
the form: 


channel(NewCh,OldCh) 


5.1. The Family Pastimes Problem 119 


NewCh represents the channel that the child wishes to watch and OldCh is a 
variable that is to be assigned the previous state of the television. The OldCh 
variable allows a child to observe that the request has been processed. 


A monitor that only accepts channel messages can be implemented as follows: 


tv_monitor(Rs) :— tv_monitor(Rs,off). 


tv_monitor([channel(NewCh,OldCh)|Rs],Ch) :— 
OldCh := Ch, tv_monitor(Rs,NewCh). 


The first argument is a message stream; the second argument is initialized to off 
by the first rule and represents the state of the television. To illustrate the use of 
this simple program, consider the following process pool which represents a family 
with three children, one of whom serves as the TV monitor. Note the use of a 
merger (Section 3.4.1) to combine requests generated by the other two children. 


child(Cs1), child(Cs2), merger([merge(Cs1)|Cs2],Cs), tv.monitor(Cs) 


This solution must be extended to ensure that children watch television only when 
the parents are playing billiards. This is achieved by modifying the monitor to 
accept messages signaling that the parents have started or stopped playing. This 
information is recorded in the monitor state. Any channel request received when 
the parents are not playing is queued to be processed when the next game begins. 


Program 5.1 implements the refined TV monitor and uses three additional 
arguments. The first represents the parents’ state; the other two comprise a dif- 
ference list (Section 3.2) used to represent pending channel messages. This list 
implements a queue: Messages are added to its end (Pe) and removed from the 
beginning (Pb). 

Initially, the television is off, the parents are not_playing and there are no pend- 
ing requests (R1). The latter condition is represented by an empty difference list. 
A channel request is queued if the parents are not playing (R2). If the parents 
are playing, a channel request is processed as in the initial monitor program (R3). 
If the parents start playing, pending channel requests are processed (R4). The 
difference list is terminated so that its elements can be accessed. If the parents 
stop playing, the television is switched off (R5). This initial process pool utilizes 
the refined monitor: 


child(Cs1), child(Cs2), parents(Ps), 
merger([merge(Cs1),merge(Cs2)|Ps],Cs), 
tv_monitor(Cs) 


120 Chapter 5. Programming Problems 


tv_monitor(Rs) :— tv-monitor(Rs,off,not_playing,Pb,Pb). % R1 


tv_monitor({[channel(NewCh,OldCh)|Rs],Ch,not_playing,Pb,Pe) :— % R2 
Pe := [{NewCh,OldCh}|Pe1], 
tv_monitor(Rs,Ch,not_playing,Pb,Pe1). 


tv_monitor([channel(NewCh,OldCh)|Rs],Ch,playing,Pb,Pe) :— % R3 
OldCh := Ch, 
tv_monitor(Rs,NewCh,playing,Pb,Pe). 

tv_monitor([start|Rs],Ch,_,Pb,Pe) :— % R4 
Pe := Í], 


pending(Pb,Ch,NewCh), 
tv_monitor(Rs,NewCh,playing,Pb1,Pb1). 


tv_monitor([stop|Rs],_,-,Pb,Pe) :— | % R5 
tv_monitor(Rs,off,not_playing,Pb,Pe). 

tv_monitor([],-,-,-,-). % R6 

pending([{Ch1,OldCh}|Ps],Ch,NewCh) :— % R7 
OldCh := Ch, 
pending(Ps,Ch1,NewCh). 

pending([],Ch,NewCh) :— % R8 
NewCh := Ch. 


Program 5.1: The TV Monitor 


5.1.1 Discussion 


The TV monitor illustrates an important programming technique: the monitor. 
A monitor encapsulates and controls access to a shared resource. It provides mu- 
tual exclusion between processes accessing the resource by sequencing accesses. 
It may also delay access until certain conditions are satisfied. This is referred 
to as condition synchronization. In the example, the shared resource is the 
television, mutual exclusion ensures that only one child changes channel at one 
time and condition synchronization delays requests to watch television until the 
parents start playing billiards. 

A monitor is represented in Strand as a perpetual process that receives a stream 
of messages. The monitor definition typically consists of a single rule that initial- 
izes the state of the resource, plus one or more rules defining operations on this 
state. The merging of multiple request streams into a single stream provides mu- 
tual exclusion. Condition synchronization is achieved by queuing requests if the 
process state does not satisfy the appropriate condition. 

In the example, the state of the resource is represented as a data structure. The 
reply values associated with messages permit other processes to observe changes 
in the resource state. Monitors may also be used to encapsulate physical devices 


5.2. The Booking Agency Problem 121 


which are controlled using hardware primitives. 


The TV monitor illustrates the use of three of the basic programming tech- 
niques presented in Chapter 3. Messages from the parents process to the monitor 
use the producer-consumers protocol. The channel requests from the children are 
incomplete messages; the monitor uses the OldCh component to reply. Finally, a 
difference list is used to implement a queue of pending channel messages. 


5.2 The Booking Agency Problem 


An agency offers an all-in-one booking service for musical events. For 
example, a patron of the arts with an evening to kill might request two 
tickets for Mozart at 7pm and two tickets for Dizzy Gillespie at 10pm. 

Telephone enquiries are handled by telephonists who use a booking 
system to reserve tickets. This system ensures that the same seat is not 
booked twice. As tickets cannot be returned, the system also allows 
telephonists to book tickets for several events at once. Hence, tele- 
phonists can request tickets for both Mozart and Gillespie and receive 
tickets either for both or for none. 


A Strand implementation of the booking system is developed here in two parts. 
The first permits telephonists to book tickets for events. The second introduces 
the features that are required to support multiple bookings. 


5.2.1 The Basic Booking System 


The booking system is run by a manager and a number of ticket clerks. 
The telephonists are linked to the manager by pneumatic tubes; other 
tubes link the manager to the clerks. A telephonist writes a booking 
on a piece of paper and places it in a tube. Compressed air propels 
the booking to the office manager who forwards it to the appropriate 
clerk. The clerk returns either the tickets or a sold-out message. 

The office manager can also be requested to open a musical event 
for bookings or to close an event. In the former case, an idle ticket 
clerk is called from the coffee room and set to work. In the latter case, 
the ticket clerk hands in any remaining tickets and stops work. 


A Strand implementation of this booking system consists of two components, cor- 
responding to the office manager and the ticket clerks. The office.manager receives 
booking requests from telephonists and forwards them to ticket clerks. It also re- 
sponds to requests to open or close events. The ticket-clerk encapsulates the tickets 
allocated to an event and processes requests to book tickets. 


122 Chapter 5. Programming Problems 


S 


"OFFICE 
MANAGER” 





The Office Manager. Telephonists can request the booking system to al- 
locate tickets, open a new event and close an event. These requests are handled 
in the first instance by the office manager. They are represented in Strand by 
messages of the form: 


open-event(Event,Number): Open an Event with an initial allocation of 
Number tickets. 


close_event(Event,Number): Close an Event and return the Number of 
remaining tickets. 


5.2. The Booking Agency Problem 123 


book(Event,Number,Seats): Allocate a Number of seats for an Event and 
return the first seat number in Seats. 


Pneumatic tubes are represented by streams and a merger is used to combine the 
streams generated by different telephonists. Thus, the manager receives a single 
stream of messages and maintains a list of labeled streams to clerks. Program 5.2 
implements the office manager and services the above messages. 


office_manager(Rs) :— office.manager(Rs,|]). % R1 


office_manager([open_event(Name,Number)|Rs],Es) :— % R2 
ticket_clerk(In,Number), 
office_manager(Rs,[{Name,|n}|Es]). 
office_manager([close-event(Name,Number)|Rs],Es) :— % R3 
send(Name,close(Number),Es,Es1), 
remove(Name,Es1,NewEs), 
office_manager(Rs,NewEs). 
office_manager([book(Name,Number,Seats)|Rs],Es) :— % R4 
send(Name,book(Number,Seats),Es,NewEs), 
office_manager(Rs,NewEs). 
office_manager([],Es) :— % R5 
close-all(Es). 


Program 5.2: The Office Manager 


The list of streams to open events is initially empty (R1). If asked to open an 
event, the manager creates a ticket clerk and stores a labeled stream to it (R2). 
The manager forwards a close request to the appropriate clerk and discards the 
associated stream (R3). Booking requests are also forwarded (R4). For brevity, 
the send and remove processes are not presented here. The former adds a message 
to a named stream and returns the new set of named streams; the latter removes 
a named stream from the set and returns those remaining. 

The Ticket Clerk. A ticket clerk, like the TV monitor presented in Sec- 
tion 5.1, encapsulates a resource (a number of tickets) and processes requests to 
access that resource (to allocate tickets). 

For simplicity, clerks allocate seats in descending numerical order. Hence, a 
clerk need only record the number of tickets remaining. Booking requests received 
from the manager are incomplete messages of the form book(Count,Seats) where 
Seats is a variable used to reply to the telephonist that made the booking request. 
Program 5.3 implements a clerk and services booking requests. 

A booking request is accepted if sufficient seats remain; the seat number of the 
first seat allocated is returned (R1). If insufficient seats are available, a sold out 
message is returned (R2). When asked to close, the clerk returns the number of 
seats remaining and terminates (R3). 


124 Chapter 5. Programming Problems 


ticket_clerk([book(Count,Seats)|Rs],Number) :— % R1 
Count =< Number | 
NewNumber is Number — Count, 
Seats := NewNumber, 
ticket_clerk(Rs,NewNumber). 
ticket_clerk([book(Count,Seats)|Rs],Number) :— % R2 
Count > Number | 
Seats := “sold out”, 
 ticket_clerk(Rs,Number). 
ticket_clerk([close(Count)|_],Number) :— % R3 
Count := Number. 


Program 5.3: A Ticket Clerk 


Avoiding Blockages. The manager and ticket clerk programs assume that 
the pneumatic tubes used to communicate requests have unlimited capacity. In 
practice, no more than four cylinders can be in transit through a tube at any 
one time: otherwise the tube may become blocked. This limited capacity can 
be represented using a bounded buffer (Section 3.1.4). Recall that the bounded 
buffer protocol is a variant of the producer-consumers protocol. It requires that 
the consumer generate a stream of variables that the producer then uses to return 
messages to the consumer. To use the protocol, the office manager creates a buffer 
of the appropriate size when it creates a ticket clerk. The ticket clerk is given 
access to the buffer: 


office. manager([open-event(Name,Number)|Rs],Es) :— 
Buffer := [M1,M2,M3,M4|End], 
ticket_clerk(Buffer,End,Number), 
office_manager(Rs,[{Name, Buffer }|Es)). 


The ticket clerk must add a variable to the end of the buffer each time it receives 
a request. For example, the first rule is adapted as follows: 


ticket_clerk([book(Count,Seats)|Rs],End,Number) :— 
Count =< Number | 
NewNumber is Number — Count, 
Seats := NewNumber, 
End := [X|NewEnd], 
ticket_clerk(Rs, NewEnd,NewNumber). 


The send process used to forward booking requests to a ticket clerk must also be 
modified to deal with the bounded buffer. 


5.2. The Booking Agency Problem 125 


5.2.2 Multiple Bookings 





GILLESPIE 






GX. 


— ICE 
MANAGER" 





Customers frequently wish to book either a collection of tickets or none 
at all; if an entire evening’s entertainment cannot be arranged, they 
prefer to stay home and play billiards. Telephonists may bundle up a 
number of requests in a single multiple booking. The booking system 
will process either all or none of the requests. 

Multiple bookings are handled by two assistant managers named 
Smith and Jones. The manager hands over multiple bookings to these 
assistants. Smith’s responsibility is to send prebooking requests to the 
appropriate clerks. These clerks signal to Jones whether seats are 


126 Chapter 5. Programming Problems 


available and await further instructions. 

If all clerks signal that seats are available, Jones instructs them 
to perform their bookings. If any clerk signals that no seats remain, 
Jones instructs all clerks to discard their bookings. Clerks revert to 
their normal duties after receiving instructions. 


A Strand implementation of this system requires extensions to both the office 
manager and ticket clerk programs presented previously. 

A New Office Manager. A multiple booking request contains a list of 
bookings. For example: 


multiple([book(mozart,2,Seats1),book(gillespie,2,Seats2)]) 


The office_manager process intercepts these messages and creates processes to per- 
form the tasks of Smith and Jones: 


office_manager([multiple(Bookings)|Rs],Es) :— 
smith(Bookings,Es,NewEs,Book,Acks), 
jones(Acks, Book), 
office_manager(Rs,NewEs). 


Smith sends a prebooking request to every clerk involved in the multiple booking. 
These have the form prebook(Count,Seats,Book,Ack). Count is the number of tickets 
requested and Seats will represent the seats allocated. Ack is an acknowledgment 
variable used by the clerk to signal whether seats required by the booking are 
available. Finally, Book is a single variable used by Jones to instruct all of the 
clerks involved to either perform or abandon the booking. 


smith([book(Event,Count,Seats)|Bs],Es,NewEs,Book,Acks) :— 
Acks := [Ack|Acks1], 
send(Event,prebook(Count,Seats,Book,Ack),Es,Es1), 
smith(Bs,Es1,NewEs,Book,Acks1). 

smith([],Es,NewEs,_,Acks) :— NewEs := Es, Acks := []. 


Smith passes the list of acknowledgments (Acks) to Jones. This process is defined 
as follows: 


jones({error|Acks],Book) :— Book := abort. % R1 
jones([ok|Acks],Book) :— jones(Acks,Book). % R2 
jones([],Book) :— Book := proceed. % R3 


Jones examines the acknowledgments: If any indicates that tickets are not avail- 
able, all clerks are instructed to abandon the booking (R1). If all acknowledge- 
ments are positive, clerks are instructed to proceed with the booking (R3). 
Figure 5.1 illustrates the communication performed in the course of a successful 
multiple booking. In (a), a prebooking request is passed to each clerk involved 


5.2. The Booking Agency Problem 12? 


prebook(Book,Ack1) 








prebook(Book,Ack2) 





prebook(Book,Ack3) 


(c) 





Book:=proceed 





Figure 5.1: A Successful Multiple Booking 


in the booking. In (b), acknowledgments are returned. In (c), the Book variable 
signals that the booking should proceed. 


The incomplete messages generated by Smith contain two variables: Ack and 
Book. The Ack variables are used in a familiar way: to communicate values back 
to Jones. The single Book variable is used to implement a communication protocol 
that we have not encountered before: Jones uses it to broadcast a message to all 
clerks. 


A producer can use a single 


variable to broadcast values 
to many consumers. 





128 Chapter 5. Programming Problems 


A New Ticket Clerk. Two additional rules are added to the clerk definition 
to deal with prebooking requests. The first signals that a booking is valid and 
spawns a process ticket-aux to determine if the booking should be applied; the 
second rule rejects a booking: 


ticket_clerk({prebook(Count,Seats,Book,Ack)|Rs],Number) :— 
Count =< Number | 
Ack := ok, 
ticket_aux(Book,Count,Seats, Number,NewNumber), 
ticket_clerk(Rs,NewNumber). 
ticket_clerk([prebook(Count,Seats,_,Ack)|Rs],Number) :— 
Count > Number | 
Ack := error, 
Seats := “sold out”, 
ticket_clerk(Rs,Number). 


The process ticket-aux waits for the variable Book to be assigned a value and then 
either applies or rejects the booking: 


ticket_aux(proceed,Count,Seats,Number,NewNumber) :— 
NewNumber is Number — Count, 
Seats := NewNumber. 
ticket_aux(abort,_,Seats,Number,NewNumber) :— 
NewNumber := Number, 
Seats := “sold out”. 


5.2.3 Discussion 


The material in this section has been concerned with distributed databases and 
atomic transactions. A distributed database consists of two or more separate 
databases that may be located on different computers. As databases are often 
shared by several processes, they are generally encapsulated in monitors. In this 
example, ticket clerks fulfill this role. The database managed by a clerk consists of 
a Single integer; however, the same techniques apply to more complex databases. 

A set of updates to be performed either in their entirety or not at all (such 
as a multiple booking) is termed an atomic transaction. The algorithm used 
here to provide atomicity is termed two-stage commit. The two stages of the 
algorithm have been differentiated by employing two processes named Smith and 
Jones. Smith applies the first stage, making a prewrite request to each entity 
involved in the transaction. This specifies the update to be performed. Each entity 
acknowledges this request to indicate whether or not its update can proceed. Jones 
applies the second stage of the algorithm by signaling either that all updates can 
proceed or that all are to be abandoned. 

The booking agency program has been constructed using two stream communi- 
cation protocols. The incomplete message protocol is used when passing requests 


5.3. The Speedy Pizza Problem 129 


from telephonists to clerks: Variables in these requests return replies to telephon- 
ists. The bounded buffer protocol is used to prevent the manager running ahead 
of the clerks and blocking the pneumatic tubes. 

In passing, the reader may be surprised to find references to an archaic pneu- 
matic communication system in this book. However, field research performed by 
the authors indicates that systems of this type are alive and well and living in a 
large suburban bank in Chicago! 


5.3 The Speedy Pizza Problem 


Speedy’s pizzas are so popular that customers place standing orders 
for years at a time. Speedy’s couriers hence have a fixed number of 
deliveries to make every evening. The scheduling of deliveries is com- 
plicated by one factor: Long-standing customers become upset if they 
hear of more recent customers receiving pizzas before them. How is 
Speedy to organize deliveries without offending loyal customers? 

Speedy has developed a novel system to deal with this problem. It 
involves writing each order on a ticket. Each customer is then con- 
sidered in turn; all orders that must be delivered after that customer 
are identified. These orders are attached to the ceiling using a single 
pin bearing the customer’s name. Thus, eventually most tickets are 
pinned to the ceiling by one or more pins. 

Speedy starts the evening by handing couriers orders that are not 
pinned to the ceiling. Each courier speeds off to deliver a pizza and then 
returns for another. A returning courier extracts the pin representing 
the completed order. This may free tickets; these float to Speedy below, 
who collects them for subsequent delivery. This process continues until 
all pizzas are delivered, at which point the couriers eat any remaining 
pizzas. 


An evening’s orders can be represented by a pizza delivery graph, in which arrows 
represent the relation must be delivered before. For example, the following graph 
states that orders for the Coxs, Parrys, Burks and Macaulays can be delivered 
immediately. The order for the Graus cannot be delivered until after that of the 
Coxs and the Parrys, the Bethels cannot be delivered until after the Parrys and 
the Burks, and the Tills get their pizza last. 


Coxs Parrys Burks Macaulays 


NO N 


Graus Bethels 


NZ 


Tills 


130 Chapter 5. Programming Problems 





An initial refinement of the delivery system consists of three components: the 
ceiling, Speedy and the couriers: 


deliveries(Couriers,Orders) :— 
length(Orders,Cnt), 
ceiling(Orders,Os), 
speedy(Cnt,Os,Rs), 
couriers(Couriers,Rs). 


The number of couriers and the evening’s orders are provided as parameters to 
this program. The variable Os represents the stréam of tickets floating down from 
the ceiling. The variable Rs represents the stream of pizza requests generated 
by couriers. Both these streams are received by Speedy. The predefined length 
process computes the number of orders to be processed (Cnt). 


5.3. The Speedy Pizza Problem 131 


5.3.1 Speedy 


The speedy process must handle requests to add (write) elements to a set of 
pending orders as well as requests to remove (read) elements. Requests to add 
elements correspond to tickets falling from the ceiling; requests to remove elements 
correspond to requests from couriers. The removal of an order must be an atomic 
action so that no pizza is delivered more than once. 

To satisfy these constraints, Program 5.4 implements Speedy using a blackboard 
(Section 3.4). The merger permits many processes to communicate with the black- 
board process concurrently (R1). The bboard process services a stream of read 
and write requests. It encapsulates a blackboard data structure that records the 
number of orders, pending orders and outstanding courier requests. The process 
ensures that each element written is read exactly once. Initially, the blackboard 
is empty (R2). A write request causes an order to be added to the list of pending 
orders (R3). A read request is registered in the list of outstanding courier requests 
(R4). If there is a pending order and an outstanding request, then the request is 
granted; this permits the courier to deliver a pizza (R5). If there are no further 


pending orders (C=0), all outstanding courier requests are serviced (R6). 


speedy(Cnt,Os,Rs) :— % R1 
merger([merge(Os)|Rs],Msgs), 
bboard(Cnt,Msgs). 

bboard(Cnt,ls) :— bboard1(Is,{Cnt,[],[]}). % R2 

bboard1 ([write(O)|Is],{C,Os,Ps}) :— % R3 
C1 is C — 1, bboard1(Is,{C1,[O|Os],Ps}). 

bboard1 ([read(O)|Is],{C,Os,Ps}) :— % RA 
bboard1 (Is, {C,Os,[O|Ps]}). 

bboard1(Is,{C,[O|Os],[P|Ps]}) :— % R5 
P := O, bboard1(Is,{C,Os,Ps}). 

bboard1(Is,{0,[],[P|Ps]}) :— % RG 
P := “go eat”, bboard1 (Is, {0,[],Ps}). 

bboard1([].{_.[ ],{]})- % R7 


Program 5.4: Speedy 


5.3.2 The Ceiling 


An evening’s orders can be represented by a list of terms with the form: 
{Name,PreviousOrders } 


Each term represents an order and relates a customer Name to those customers 


132 Chapter 5. Programming Problems 


whose orders must be completed first. For example, the set of orders depicted in 
the pizza delivery graph is represented by the following term: 


[{coxs,[]}, 

{parrys,[]}, 

{burks,[]}, 

{macaulays,| ]}, 
{graus,[coxs,parrys]}, 
{bethels,[parrys,burks]}, 

{tills [graus,bethels, macaulays]}] 


The ceiling process creates a ticket process for each order in this list. Each ticket 
process waits until the orders on which it is dependent have been delivered and 
then sends its order to Speedy. 

The method used to delay the sending of orders is analogous to Speedy’s use of 
pins. Each order has a pin variable associated with it. A courier assigns a value to 
this variable when the order has been processed (delivered). Thus, orders can be 
sequenced by causing an order process to delay until the variables associated with 
dependent orders have been assigned values. The delaying corresponds to pinning 
an order to the ceiling; assigning values to variables corresponds to extracting pins. 
For example, the following ticket processes are created to represent the above set 
of orders: 


ticket(coxs,C,[], -) 
ticket(parrys,P|[ ],-) 
ticket(burks,B1,[ J,-) 
ticket(macaulays,M,| ],-) 
ticket(graus,G,[C,P],-) 
ticket(bethels,B2,[P,B1],_) 
ticket(tills, T,[G,B2,M],-) 


Thus for example the Bethels’ order cannot be processed until the variables P and 
B1 are assigned values; this occurs when the orders for the Parrys and the Burks 
are delivered. Program 5.5 creates this process structure from the original orders 
data structure. 

The process structure is created by scanning the orders data structure (Orders) 
and creating a ticket process for each order (R2,4). Each ticket process is given a 
stream to Speedy (merge(ToSpeedy): R2). The irregular communication pattern 
represented by the shared pin variables is developed by creating a dictionary process 
(R1) with which all ticket processes can communicate. This communication occurs 
via the stream Dict (R1). This stream is constructed as a difference list by the ticket 
processes (Dict,Dict1: R2). Each ticket process registers its own pin variable with this 
dictionary (R4) and requests the pin variables associated with the ticket processes 
corresponding to dependent orders (R5). Finally, the ticket process suspends until 
all variables retrieved from the dictionary are assigned a value (done: R7) and 
then sends its order to Speedy (R8). 


5.3. The Speedy Pizza Problem 133 


ceiling(Orders,Os) :— % R1 
init_tickets(Orders,Os, Dict), 
dictionary(Dict). 

init_tickets([{Name,Ds}|Orders],Os,Dict) :— % R2 


Os := [merge(ToSpeedy)|Os1], 
init_ticket(Name,Ds, Dict, Dict1, ToSpeedy), 
init_tickets(Orders,Os1,Dict1). 

init_tickets({],Os,Dict) :— Os := [], Dict := []. % R3 


init_ticket(Name,Ds,Dict,Dict2, ToSpeedy) :— % R4 
Dict := [write(Name,Pin)|Dict1], 
pins(Ds, Dict1 ,Dict2,Pins), 
ticket(Name, Pin, Pins, ToSpeedy). 


pins([Name]Ds], Dict,Dict2,Pins) :— % R5 
Dict := [read(Name,Pin)|Dict1], 
Pins := [Pin|Pins1], 
pins(Ds, Dict1,Dict2,Pins1). 


pins({],Dict,Dict1,Pins) :— Dict := Dict1, Pins := []. % R6 

ticket(Name,Pin,[done|Pins], loSpeedy) :— % R7 
ticket(Name,Pin, Pins, ToSpeedy). 

ticket(Name,Pin,[], toSpeedy) :— % R8 


ToSpeedy := [write({Name,Pin})]. 


Program 5.5: Building the Ceiling 


The Dictionary. The dictionary is a commonly used Strand process. It ser- 
vices read and write requests on a set of {Key,Value} pairs; these requests retrieve 
and add keyed values respectively. The process presented in Program 5.6 main- 
tains its data set as an unordered list; more efficient structures are suggested in 
the exercises. 


The dictionary is initialized to be empty (R1). Read and write requests cause 
the dictionary D to be scanned for the required Key using a lookup process (R2,3). 
This process considers each dictionary entry in turn; if the key matches the first 
element, it reads or writes the associated value (R5,6). If the key does not match, 
the rest of the dictionary is scanned (R7). If the key named in either a read or 
write request is not found, an entry is added to the front of the dictionary (R8). 
It does not matter in which order requests are made. Note that a key may be read 
many times but can only be written once. 


134 Chapter 5. Programming Problems 


dictionary(Rs) :— dict(Rs,[]). % R1 


dict([write(Key,Val)|In],D) :— % R2 
lookup(write,D,Key,Val,D,NewD), 
dict(In,NewD). 

dict({read(Key, Val)|In],D) :— % R3 
lookup(read,D,Key, Val,D,NewD), 
dict(In,NewD). 


dict([],-). % R4 
lookup(write,[{Key, Val}|_],Key,WVal,OldD,NewD) :— % R5 
Val := WVal, NewD := OldD. 
lookup(read,[{Key,Val}|-],Key,RVal,OldD,NewD) :— % R6 
RVal := Val, NewD := OldD. 
lookup(T,[{DKey,-}|D],Key, Val,OldD,NewD) :— % R7 
Key =\= DKey | 
lookup(T,D,Key, Val,OldD,NewD). 
lookup(_,[],Key,Val,OldD,NewD) :— % R8 


NewD := [{Key,Val}|OldD]. 


Program 5.6: A Dictionary 


5.3.3 Speedy’s Couriers 


The final program component to be provided is the couriers. Program 5.7 creates 
N couriers, giving each a stream to the speedy process. A courier initially is 
ready to deliver a pizza (R3). A courier that is ready to make a delivery, requests 
an order from Speedy and processes the order (R4). The order is processed by 
assigning the value done to the pin variable associated with the order (R6); recall 
that this allows other orders to become available. A courier continues requesting 
and processing orders until Speedy informs it that no more orders are available 
(R5,7). 


5.3.4 Discussion 


This section has been concerned with task scheduling. The structure developed 
by Speedy to organize pizza deliveries solves a general scheduling problem, namely: 


“Given a set of dependent tasks, allocate tasks to workers so as to keep 
workers busy without violating dependencies.” 


The algorithm employed by Speedy is simple: Tasks for which dependencies have 
been resolved are allocated to workers on a first-come, first-served basis. Allocation 
is performed without concern for task size. 


5.4. The Noble Ancestors Problem 135 


couriers(N, ToSpeedy) :— % R1 
N>0 | 
ToSpeedy := [merge(ToS)| ToSpeedy 1], 
Ni is N — 1, 
courier(ToS), 
couriers(N1, ToSpeedy1). 
couriers(0,ToSpeedy) :— ToSpeedy := []. % R2 


courier(ToS) :— courier(ToS,next). % R3 


courier(ToS,next) :— % R4 
ToS := [read(Order)|ToS1], 
process-order(Order,Next), 
courier(ToS1,Next). 


courier(ToS,halt) :— ToS := []. % R5 

process_order({_,Pin},Next) :— % R6 
Next := next, Pin := done. 

process_order(“go eat”,Next) :— Next := halt. % R7 


Program 5.7: The Couriers 


The program has been constructed using four of the basic programming tech- 
niques presented in Chapter 3: producer-consumers, incomplete messages, black- 
boards and difference lists. In particular, the speedy process encapsulates a black- 
board data structure and provides atomic read and write operations. A difference 
list permits many processes to cooperate in constructing a list of dictionary re- 
quests. 


5.4 The Noble Ancestors Problem 


Though of lowly birth, Fred and Mildred Bloggs are convinced that 
their family includes members of the French aristocracy. They have 
spent years collecting information about their ancestors in a quest to 
elevate their social standing. Unfortunately, their research has been 
temporarily curtailed by the difficulty of sifting through piles of an- 
cient documents. In an effort to locate information more quickly, they 
develop a sophisticated card-catalogue system and techniques for pe- 
rusing this catalogue. They eventually discover four or five alternative 
schemes for uncovering the long-desired noble ancestors. 


136 Chapter 5. Programming Problems 










A 


a 





l 
EL HMILLER count LAD 
PHILIP ALISON 


The Bloggs’ problem is typical of many problems in which a large amount of 
data is to be searched for items that satisfy specific criteria. Rather than solve 
the Bloggs’ problem in isolation, we instead present a variety of abstract search 
algorithms. These can be used to traverse search trees expressed in terms of the 
following components: 


1. An initial state. 


2. new-states(State,States): A generation function which can be applied to a 
state to obtain a (potentially empty) set of accessible states. 


3. final_state(State,Result): A termination function which when applied to a 
State assigns Result to true if the state satisfies the search criteria and to 
false otherwise. 


The first two components define a search tree. For example, consider the Bloggs’ 
family tree illustrated above. The initial state is “F. Bloggs” and the generation 
function is “parents-of”. The termination function could be defined as follows: 


5.4. The Noble Ancestors Problem 137 


final_state(Person,Result) :— 
noble(Person,Rating), result(Rating, Result). 


result(Rating,Result) :— 
Rating =\= commoner | Result := true. 
result(commoner,Result) :— Result := false. 


5.4.1 Exhaustive Parallel Search 


A simple approach to the search problem is to exhaustively inspect all branches 
of the search tree concurrently, as shown in Program 5.8. A call to this program 
specifies an initial state. The termination function is applied to each state to de- 
termine whether it is a final state (R1). If so, the search on that branch terminates 
(R3). If not, the search is extended to other accessible states using the generation 
function (R2,4,5). 


search(State) :— % R1 
final_state(State,R), search(State,R). 

search(State,false) :— % R2 
new-states(State,States), search-all(States). 

search(_,true). % R3 

search_all({State|Ss]) :— % R4 
search(State), search-_all(Ss). 

search_all({ ]). % R5 


Program 5.8: Exhaustive Parallel Search 


The program does not specify the order in which branches of the search tree 
are visited. The only thing that can be said about the program’s behavior is 
that (assuming sufficient time and space) every part of the tree will eventually 
be visited. However, as the program creates a large number of processes, it is 
not particularly practical. Fortunately, it can be extended in a variety of ways to 
reduce the number of states visited. It can also be enhanced to collect some or all 
solutions. 


5.4.2 Depth-First Search 


Genealogists frequently concentrate on the paternal line when exploring a person’s 
ancestry. For example, they would explore the following sequence of states in the 
Bloggs’ family tree: F. Bloggs, S. Bloggs, T. Bloggs, etc. The paternal-line-first 
strategy corresponds to a useful and efficient search strategy called depth-first 
search. 


138 Chapter 5. Programming Problems 


A depth-first search program can be obtained from Program 5.8 by threading a 
short-circuit through the process structure (Section 3.3). Program 5.9 shows how 
this circuit is used to constrain the order in which search processes are executed. 
The circuit ensures that a state cannot be explored until all states above it and 
to the left in the tree have been completely explored. This is achieved by waiting 
for the value done at appropriate synchronization points (R3,5). 


dsearch(State) :— dsearch(State,done,-). % R1 

dsearch(State,D,D1) :— % R2 
final_state(State,R), dsearch(State,R,D,D1). 

dsearch(State,false,done,D1) :— % R3 
new-states(State,States), dsearch-all(States,done,D1). 

dsearch(_,true,D,D1) :— D1 := D. % R4 

dsearch.all([State|Ss],done,D2) :— % R5 
dsearch(State,done,D1), dsearch_all(Ss,D1,D2). 

dsearch.all([],D,D1) :— D1 := D. % R6 


Program 5.9: Depth-First Search 


Program 5.9 effectively uses a chain of suspended processes to implement a 
stack of unexplored branches in the search tree. This stack can also be imple- 
mented explicitly as a data structure; in particular, as a list, as shown in Pro- 
gram 5.10. In this program we assume that the new-states function constructs 
new states as a difference list; this permits us to concatenate the new states to 
the remaining states (Ss) in constant time (R4). 


dsearch(State) :— dsearch-all([State]). % R1 

dsearch.all([State|Ss]) :— % R2 
final_state(State,R), dsearch(State,R,Ss). 

dsearch-all([}). % R3 

dsearch(State,false,Ss) :— % R4 
new-_states(State,States,Ss), dsearch_all(States). 

dsearch(_,true,Ss) :— dsearch-all(Ss). % R5 


Program 5.10: Alternative Depth-First Search 


5.4. The Noble Ancestors Problem 139 


5.4.3 Depth-Bounded Search 


Execution of Program 5.10 on the Bloggs’ database uncovers many Bloggs but no 
nobility. Yet, as the family tree shows, noble ancestors do exist on other branches. 
The problem is that a depth-first search is efficient but potentially incomplete: It 
may fail to find even very shallow solutions if they are to the right of deep branches 
in the search tree. One solution to this problem is to explore only a fixed number 
of tree levels at a time. Program 5.11 implements this strategy by extending 
Program 5.8 with an additional depth argument N. This is decremented at each 
level in the tree; the search stops when N reaches zero. The following process 
executes this program to explore the Bloggs’ family tree to five generations. 


ksearch(‘F. Bloggs’, 5) 


ksearch(State,N) :— % R1 
N > 0 | final_state(State,R), ksearch(State,N,R). 
ksearch(_,0). % R2 
ksearch(State,N,false) :— % R3 
N1 is N — 1, 


new_states(State, States), 
ksearch_all(States,N1). 


ksearch(_,_,true). % RA 

ksearch_all([State|Ss],N) :— % R5 
ksearch(State,N), ksearch_all(Ss,N). 

ksearch_all({ ],-). % R6 


Program 5.11: Depth-bounded Exhaustive Search 


It is possible to extend Program 5.11 so as to elaborate the first N levels of a tree 
concurrently and then explore subsequent levels using depth-first search. This is 
achieved by replacing the second rule with: 


ksearch(State,0) :— dsearch(State). 


The resulting program creates one dsearch process for each leaf of the search tree 
at level N. These processes execute concurrently. 


5.4.4 Heuristic Search 


The Bloggs’ database is so large that it would take years to search it all. One 
way to achieve a more effective search is to use heuristics to determine the order 
in which states are explored. For example, the Bloggs might suppose that people 


140 Chapter 5. Programming Problems 


with long names are more likely to have noble ancestors than people with short 
names. It is thus better to explore the ancestors of Pilkington-Jones first. Seeking 
as before to keep the search programs general, we introduce the notion of an 
estimation function for a state: 


value(State, Value): When applied to a State, returns a number Value 
representing an estimation as to how likely it is that a state will lead 
to a solution. 


The Bloggs’ estimation function can be defined simply as: 
value(Name, Value) :— length(Name, Value). 


This function allows us to design heuristic search programs that visit the search 
tree in a more directed fashion. For example, it is possible to maintain a queue of 
pending states, ordered according to estimated value, and at each step select the 
state with the highest value. A program that applies this strategy can be obtained 
by modifying this rule from Program 5.10: 


dsearch(State,false,Ss) :— % R4 
new-states(State,States,Ss), 
dsearch_all(States). 


The new rule spawns an insert process to insert the new states into the remaining 
states data structure according to their estimated value. 


hsearch(State,false,Ss) :— % R4 
new-states(State, States), 
insert(States,Ss,NewSs), 
hsearch_all(NewSs). 


5.4.5 Collecting Solutions 


The search programs developed previously can be extended to collect solutions. 

All Solutions. A simple way of collecting all solutions is to augment a search 
program with a difference list, in the same way as Program 5.9 augmented Pro- 
gram 5.8 with a short circuit. The result of this modification is shown in Pro- 
gram 5.12. Each search process adds a solution to the difference list if it encoun- 
ters a final state (R3); otherwise it splits the difference list among its offspring 
(R4). 

A disadvantage of this approach is that solutions may not become available 
until the entire search has terminated. In fact, solutions to the right of an infinite 
branch will never become available unless the depth of the search is bounded. This 
problem can be avoided if each search process is able to report solutions as and 
when they are detected. This requires that all search processes be able to write 
to a single solution data structure. This can be achieved using a merger. An all- 
solutions search program can hence be obtained from Program 5.8 by incorporating 


5.4. The Noble Ancestors Problem 


search(State,Sb,Se) :— 
final_state(State,R), search(State,R,Sb,Se). 


search(State,false,Sb,Se) :— 


new-states(State,States), search-_all(States,Sb,Se). 


search(State,true,Sb,Se) :— 
Sb := [State|Se]. 


search.all([State|States],Sb,Se) :— 
search(State,Sb,Sm), search_all(States,Sm,Se). 
search-all([],Sb,Se) :— Sb := Se. 


Program 5.12: Collecting All Solutions 


% R1 


7% R2 


7% R3 


% RA 


7% R5 


141 


the code required to create a merger, to distribute streams to the merger and to 


pass solutions to it. 


Program 5.13 shows the modified search algorithm. Solutions become available 
incrementally on the solutions stream (Solutions: R1) as they are generated by 
search processes. This stream is closed if and when the search terminates (R4,6). 


search_init(State,Solutions) :— 
merger(Ss,Solutions), search(State,Ss). 


search(State,Ss) :— 
final_state(State,R), search(State,Ss,R). 


search(State,Ss,false) :— 
new-_states(State,States), search_all(States,Ss). 
search(State,Ss,true) :— Ss := [State]. 


search.all([State|States],Ss) :— 
Ss := [merge(Ss1)|Ss2], 
search(State,Ss1), 
search_all(States,Ss2). 

search-all([],Ss) :— Ss := []. 


Program 5.13: All-solutions Search 


% R1 


7% R2 


% R3 


% R4 


% R5 


% R6 


First N Solutions. It is often useful to be able to collect a finite number 
of solutions and then abort further searching. This can be achieved by extending 
Program 5.13 to permit its execution to be aborted. A Strand program is made 
abortable by introducing an additional argument which, when assigned a value, 


142 Chapter 5. Programming Problems 


causes execution to halt. For example, an abortable version of Program 5.13 is 
shown in Program 5.14. Note that each rule is augmented with an additional 
argument, Stop, and that the new second rule terminates the search when this 
variable is assigned the value stop. In addition, the guard test unknown(Stop) in 
the first rule ensures that the search is expanded only if the Stop variable has not 
yet been assigned. 


search(State,Ss,Stop) :— 
unknown(Stop) | 
final_state(State,R), search(State,Ss,Stop,R). 
search(_,_,stop). 


search(State,Ss,Stop,false) :— 
new-_states(State,States), search-_all(States,Ss,Stop). 
search(State,Ss,_,true) :— Ss := [State]. 


search.all([State|States],Ss,Stop) :— 
Ss := [merge(Ss1)|Ss2], 
search(State,Ss1,Stop), 
search_all(States,Ss2,Stop) 

search_all([],Ss,_) :— Ss := []. 


Program 5.14: An Abortable Search 


This program can be used by modifying the search-_init process in Program 5.13. 
The new process creates a selectN process to filter the solution stream; it assigns 
a value to the Stop variable once N solutions have been collected: 


search_init(State,N,NSolns) :— 
merger(Ss,Solns), 
selectN(N,Solns,NSolns,Stop), 
search(State,Ss, Stop). 


selectN(N,[S|Solns],NSolns,Stop) :— 
N>0 | 
N1 is N — 1, NSolns := [S|NSolns1], 
selectN(N1,Solns,NSolns1,Stop). 
selectN(0,_,NSolns,Stop) :— 
NSolns := [], Stop := stop. 
selectN(_,[], NSolns,_) :— NSolns := []. 


5.5. The Philosophical Programmers Problem 143 


5.4.6 Discussion 


This section has presented an abstract definition of the search problem that com- 
prises generation, termination and evaluation functions. Strand formulations of 
a number of search strategies have been presented in this framework, including 
exhaustive, depth-first, bounded and heuristic search. Methods for collecting so- 
lutions have also been discussed. 

The programs developed in this section have employed several of the program- 
ming techniques introduced in Chapter 3. A short circuit was used in Program 5.9 
to synchronize the execution of search processes and hence impose a depth-first 
strategy. Program 5.12 used a difference list to collect solutions; this permitted 
many search processes to contribute to a shared solution stream. 


5.5 The Philosophical Programmers Problem 










A software company that employs four programmers wishes to save 
money on computer equipment. The directors decide to provide only 
one terminal and keyboard for every two programmers. Desks are lo- 
cated around the walls of the programming laboratory; keyboards and 
terminals are placed on desks so that each programmer has a keyboard 
on one side and a terminal on the other. However, a programmer may 


144 Chapter 5. Programming Problems 


be forced to wait to use either piece of equipment if it is in use. 

Programmers sit with pencil in hand and ponder over problems un- 
til a flash of inspiration reveals a solution. They then seek to obtain 
access to a terminal and keyboard in order to test their solution. Hav- 
ing completed one problem, they relinquish the computer and return 
to pondering the next. Once inspired, a programmer does not return 
to ponder until the solution has been tested. 


This story is the well-known dining philosophers problem, in a more modern set- 
ting. It illustrates three important problems that can occur when concurrent 
processes compete for resources. First, exclusive access is generally required. For 
example, in the story only one programmer should be able to use a terminal or 
keyboard at a time. The problem of achieving exclusive access is known as the 
mutual exclusion problem; we have already encountered this in the Family Pas- 
times example (Section 5.1). 

If concurrent processes can gain exclusive access to resources, then a second 
problem can arise. This is readily illustrated using the story. It is easy to see that 
all programmers can become inspired while in possession of a single component. 
If inspired programmers do not yield components to neighbors, then no other 
programmer will ever be able to proceed. This problem is known as deadlock. 

Deadlock can be avoided by requiring processes that are waiting for resources 
to yield resources that they have already obtained. However, a third, more subtle 
problem can then arise: Two processes can conspire to prevent a third from pro- 
ceeding. For example, in the story, two programmers can gang up on their mutual 
neighbor: Every time the unfortunate programmer obtains one component, the 
appropriate neighbor requests it before the other component becomes available. 
This problem is known as starvation. 

A simple and well-known solution to these problems is to encapsulate all of 
the shared resources in a monitor. A process that wishes to access one or more 
resources makes a request to the monitor. 

Program 5.15 presents a Strand implementation of a monitor solution to the 
philosophical programmers problem. The initial situation consists of four pon- 
dering programmers, a merger that combines messages from these programmers 
and a monitor (R1). The monitor has a local state consisting of the number of 
outstanding requests (initially 0), a empty difference list representing outstanding 
requests (Rq,Rq), and a set of available resources (R2). Two sets of components 
are available; these are represented as sets (R3). 

The monitor services a stream of requests for granting and releasing resources. 
In this simple example, the resources that it manages are complete sets of com- 
ponents. A request to grant resources (req(R)) is added to the end of the pending 
request list if no resources are available (R4). If a complete set of resources is 
available, the request is granted immediately (R5). A released resource (rel(R)) is 
used to service an outstanding request if there is one (R6); otherwise, the released 
resource is added to the resource list (R7). 

The behavior of a programmer is specified by rules R9-12. A pondering pro- 


5.0. The Philosophical Programmers Problem 145 


go :— % R1 
prog(ponder,P1), prog(ponder,P2), 
prog(ponder,P3), prog(ponder,P4), 
merger([merge(P1),merge(P2),merge(P3)|P4],Rs), 


monitor(Rs). 
monitor(In) :— initial.resources(Cs), monitor(In,0,Rq,Raq,Cs). % R2 
initial.resources(Rs) :— Rs := [set,setl. % R3 
monitor([req(R)|In],N,Rf,Rb,[]) :— % R4 


Rb := [R|Rb1], N1 is N +1, 
monitor(In,N1,Rf,Rb1,[]). 


monitor([req(R1)|In],N,Rf,Rb,[R|Cs]) :— % R5 
R1 := R, 
monitor(In,N,Rf,Rb,Cs). 

monitor([rel(R)|In],N,[R1|Rf],Rb,Cs) :— % R6 


R1 := R, N1 is N — 1, 
monitor(In,N1,Rf,Rb,Cs). 


monitor({rel(R)|In],0,Rf,Rb,Cs) :— % R7 
monitor(In,0,Rf,Rb,[R|Cs)). 

monitor([],0,-,-,-). % R8 

prog(ponder,Rs) :— prog(inspired,Rs). % R9 

prog(inspired,Rs) :— Rs := [req(R)|Rs1], prog1 (inspired, Rs1,R). % R10 

prog(program,Rs) :— Rs := [rel(set)|Rs1], prog(ponder,Rs1). % R11 

prog1(inspired,Rs,set) :— prog(program,Rs). % R12 


Program 5.15: Solution to Philosophical Programmers Problem. 


grammer can become inspired (R9). An inspired programmer requests resources 
from the monitor (R10) and proceeds to start programming when this request 
is granted (R12). A programming programmer releases resources and reverts to 
pondering (R11). 

Let us consider how this program achieves mutual exclusion and avoids both 
deadlock and starvation. The monitor grants resources to a single programmer at 
a time; thus, mutual exclusion is ensured. A complete set of resources is granted 
together. Thus, no programmer can ever wait for a resource while holding other 
resources; this avoids deadlock. Finally, requests for resources are always granted 
in the order in which they arrive at the monitor and all requests made by pro- 
grammers eventually arrive (this is a property of Strand’s merger: Section 3.4.1). 
Consequently, starvation cannot occur. 


146 Chapter 5. Programming Problems 


The reader is encouraged to check that requests are indeed granted in the 
order in which they are received. A request may be granted as soon as it arrives if 
resources are available (R5); however, this can only occur if there are no pending 
requests. To verify that the pending request queue is empty when rule R5 is 
selected, observe that the monitor maintains the following invariant: 


“Either the pending requests, the available resources or both are empty”. 


The invariant is initially true (R2). Rules R4 and R7 maintain the request set and 
the resource set empty respectively. Rules R5 and R6 do not add to either set and 
hence maintain the invariant. 

A consequence of the invariant is that rule R5 can only be used when the queue 
is empty. Hence, pending requests are serviced in the order they arrive (R4,6). 

In summary, the Philosophical Programmers problem illustrates the problems 
of deadlock and starvation in concurrent systems. The solution presented here 
avoids both problems by forcing all requests to pass via a monitor. This is a 
centralized solution to these problems; distributed solutions that do not require a 
central manager are also possible but are more complex to implement. 

Program 5.15 uses the producer-consumers and incomplete message protocols 
introduced in Chapter 3. The monitor process used to service requests for resources 
is similar to that employed in the Family Pastimes problem to sequentialize ac- 
cesses to the television. Recall that a similar construct was also used to match 
requests and orders in the Speedy Pizza problem. 


5.6 Summary 


This chapter has used all six of the basic techniques presented in Chapter 3 to 
develop solutions for a number of programming problems. In each case, a problem 
has been described and Strand implementations have been presented. 

The Family Pastimes problem introduced the use of monitors as a mechanism 
for encapsulating and controlling access to shared resources. A monitor provides 
information hiding, mutual exclusion and condition synchronization. The Booking 
Agency problem introduced problems associated with distributed databases and 
showed techniques that support atomic transactions. The Speedy Pizza problem is 
a scheduling problem in which tasks are scheduled in the presence of dependencies. 
The Noble Ancestors problem was used to illustrate how a variety of search strate- 
gies can be expressed in Strand. Finally, the Philosophical Programmers problem 
deals with the problems of deadlock and starvation in concurrent systems. 


Exercises 


5.1 The parents introduced in Section 5.1 change their stance on television and 
agree that their children can watch the educational channel at any time. 
Modify the program to reflect this change in conditions. 


5.6. 


5.2 


5.3 


5.4 


5.9 


5.6 


5.7 


5.8 


5.9 


5.10 


5.11 


Summary 147 


In another, less disciplined household, the children complain that noise from 
the billiards table disturbs their television watching. The parents agree to 
play billiards only when the children are not watching television. Develop a 
Strand program that implements this situation. 


The following hardware primitives can be used to control a tape drive: 
read(Block), write(Block), rewind, seek(N). The first reads the next block, the 
second writes to the next block, the third rewinds to the beginning of the 
tape and the last seeks to block N. Write a monitor that processes a stream 
of requests to read and write specified blocks. The monitor should process 
five requests at a time and minimize tape movement. 


Complete the booking system presented in Section 5.2 by defining missing 
processes. Make the system robust, so that it deals with invalid requests. 
Extend the system to support seats_available and return requests that permit 
clerks to enquire about seat availability and return tickets respectively. 


Modify the booking system developed in Exercise 5.4 to incorporate bounded 
buffers, as outlined in Section 5.2. 


Theaters have M seats per row, where M is different for each theater. Modify 
the clerk program so that whenever possible it allocates tickets for adjacent 
seats in the same row. 


The booking system does not permit two multiple bookings from occurring 
concurrently, even if they affect disjoint sets of events. Specify a system that 
allows concurrent multiple bookings. 


The distributed database system presented in Section 5.2 used a centralized 
distributor to route requests to databases. Develop a booking system that 
does not require a centralized distributor. This can be done in two stages: 
First, assuming a fixed number of databases and second, allowing for addition 
and deletion of databases. 


Implement a distributed database system for a bank. Accounts are dis- 
tributed over different machines using a hash function applied to the ac- 
count number. The system should support account enquiries, withdrawals 
and transfers. Transactions should be charged for when a customer has less 
than 100 units in an account. 


Modify the dictionary process given in Section 5.3.2 to maintain its data set 
as a tree. 


Modify the Speedy Pizza program presented in Section 5.3 to simulate the 
execution of a set of tasks defined recursively as follows. A task is: 


e A minor task, to be executed directly or 


e A major task consisting of 


148 


5.12 


5.13 


5.14 


5.15 


5.16 


Chapter 5. Programming Problems 


1. One task that performs initialization. 
2. A set of independent tasks (these can be executed in parallel, after 
the initialization task completes). 


3. One termination task (which must be executed after all the inde- 
pendent tasks complete). 


Make each worker service major tasks by executing the initialization task 
and then passing the other tasks to the scheduler. (Remember that the 
termination task is dependent on the other tasks!) 


Write an exhaustive search program that collects not only final states but 
also the path taken to get to each final state. 


A disadvantage of Program 5.11 is that it cannot find solutions at depth 
greater than the depth bound k. Iterative deepening avoids this problem by 
repeatedly extending the depth bound until a solution is found. Write a 
Strand program that applies iterative deepening. 


The list of pending states maintained by Program 5.10 can be maintained 
as a queue rather than a stack: Offspring nodes are added to the end rather 
than the beginning. This gives a breadth-first search: All nodes at depth N 
are explored before any nodes at depth N+1. Extend Program 5.10 to both 
implement this behavior and collect all solutions. 


A group of cooks share a kitchen equipped with varying numbers of spoons, 
knives, bowls, mixers and plates. Each cook’s recipe needs a different com- 
bination of these utensils. Specify a system that ensures that all cooks com- 
plete their cooking. First, assume that each cook tries to obtain all utensils 
before starting cooking. Second, assume that cooks may obtain and release 
utensils during cooking, but that they declare before starting the maximum 
number of utensils that they will require. Satisfy yourself that both solutions 
ensure that deadlock and starvation cannot arise. 


Modify Program 5.15 so that it can be used to simulate the activities of 
a group of programmers, each of whom spends a random amount of time 
pondering and programming. 


Part II 


Advanced Techniques 


\ 


l hy 
iB | 
\W | 





á if © 
f Vi he =x 
Wy 
WH | 





Chapter 6 


Programming in the Large 


This chapter is principally concerned with the techniques used to write large Strand 
programs. The term programming in the large is used to stress that the design 
problems encountered in large programs are qualitatively different than those en- 
countered in small programs. 

All programs benefit from development methodologies that tackle problem 
complexity by decomposition. However, large programs exhibit unique problems 
that require a particular type of decomposition. This chapter introduces a program 
development methodology called modular decomposition that is particularly 
appropriate for large systems. The essence of the methodology is to begin pro- 
gram design by isolating common and changeable program properties. These de- 
sign decisions are then encapsulated inside program units called modules. Strand 
provides simple linguistic support for the definition and use of modules. 


6.1 Principles 


A large program is one that cannot easily be understood by a single individual. It 
may consist of many thousands of lines of code and is often developed by a team of 
software engineers. Large programs represent significant investments of resources. 
Design methodologies should seek to protect this investment by making it easy to 
modify programs to suit changing requirements and to reuse components in new 
systems. 

Modular design is widely accepted as the standard technique for developing 
large systems. A modular design defines a complex system in terms of a collection 
of modules and associated interfaces. Each module implements a particular sys- 
tem component. The module interface specifies functions that are made available 
or exported to other modules; it also specifies functions that are borrowed or 
imported from other modules. In principle, this approach provides four main 
benefits: 


151 


152 Chapter 6. Programming in the Large 


1. Reduced development time. Separate groups should be able to work on 
each module more or less independently. 


2. Flexibility. It should be possible to modify and maintain one module with- 
out affecting others. 


3. Reuse. It should be possible to reuse modules in new systems. 


4. Clarity. The system that is produced should be easier to understand, as 
individual modules can be studied in isolation. 


However, we shall see that these benefits do not automatically derive from the use 
of modules but are dependent on the approach used to modularize a program. 

In earlier chapters, we presented a methodology for dividing tasks into subtasks 
called stepwise refinement. This might appear to be a suitable tool for modulariz- 
ing large programs: Responsibility for each subtask identified in a refinement step 
can be assigned to a separate module. Unfortunately, stepwise refinement has two 
deficiencies as a methodology for modularization. First, it does not recognize com- 
monalities between subtasks; this can lead to a duplication of effort as functions 
are reimplemented in several modules. Second, it does not encourage the recogni- 
tion of program properties that are likely to change during the program life cycle. 
The development of large programs involves many difficult design decisions, some 
of which will inevitably be revoked as requirements evolve. As stepwise refinement 
does not isolate these design decisions, change can require expensive modifications 
to many modules. 

Modular decomposition addresses these problems directly. The basic idea is 
to begin program design by identifying common components and design decisions 
that are likely to change. Module interfaces are then designed to encapsulate 
these common components and to hide design decisions. Explicit recognition of 
common elements avoids duplication of effort and encourages reuse. The hiding 
of design decisions reduces the impact of change. Modular decomposition hence 
introduces a bottom-up component into the design process; in contrast, stepwise 
decomposition works in a purely top-down fashion. 


Modular Decomposition: 


e Identify common components 
and design decisions that 


are likely to change. 


e Hide these components and 
design decisions behind 
interfaces. 





6.2. Modularization Using Stepwise Refinement 153 


The difference between stepwise refinement and modular decomposition tends to 
seem rather abstract unless illustrated using real examples. In subsequent sections 
we take a simple problem, develop modularizations using both techniques, and 
investigate how well the two designs support modification and reuse. 

We choose to investigate the design of a simple interactive line editor that 
permits users to edit named text files by typing a series of simple commands at a 
keyboard. These commands can request the editor to perform various operations 
on specified lines. For example, the editor can display, modify, insert, delete, copy 
and move lines as well as substitute character strings within lines. The editor 
displays the result of executing each command on a screen. When the user exits 
the editor, the modified file replaces the original file on disk. 

This is not, of course, a large system. However, given the impracticality of 
tackling a realistic problem in this context, we must treat this simple line editor 
as if it were a substantial project. 


6.2 Modularization Using Stepwise Refinement 


Recall that stepwise refinement proceeds by a series of refinement steps, each 
of which partitions an initial problem into a set of subproblems. A likely first 
refinement step for the line editor system recognizes that editing a file involves the 
actions: loading the contents of the file into an internal data structure, applying 
a series of commands typed by the user, and finally replacing the original file by 
the edited copy. This decomposition may be expressed as: 


edit(FileName) :— 
load_file(FileName,Document), 
apply _commands(Document,NewDocument), 
save_file(FileName,NewDocument). 


A second refinement step decomposes the task of processing commands into two 
subtasks: getting a command from the keyboard and applying that command to 
the document. A termination condition is also required; this is not shown. 


apply_commands(Document,NewDocument) :— 
get_command(Commanda), 
apply_.command(Command,Document,Documentt1), 
apply _commands(Document1,NewDocument). 


A further refinement step partitions the problem of processing commands into the 
subproblems of inserting a new line, moving a line, etc. 


apply_command(insert(N), Document,Document1) :— 
insert(N, Document, Document). 

apply_.command(move(M,N),Document,Document1) :— 
move(M,N,Document,Documentt). 


154 Chapter 6. Programming in the Large 


Each function identified during this refinement process can be placed in a separate 
module. This yields a design that includes the following modules: 


Loader. This module loads the contents of a named file into an inter- 
nal data structure. 


Command. This module takes the data structure produced by the 
Loader module, applies a series of user commands to it, and eventually 
produces a (potentially modified) data structure representing an edited 
document. This module coordinates the actions of other modules, such 
as the Keyboard, Insert and Move modules. 


Saver. This module takes the data structure produced by the Com- 
mand module and saves the document that it represents in a named 
file. 


Insert. This module processes an insert command. It takes the data 
structure representing the document and modifies it by inserting a new 
line at a specified line number. 


Move. This module processes a move command. It takes the data 
structure representing the document and modifies it by moving a spec- 
ified line to a new position. 


Keyboard. This module is concerned with obtaining a command from 
the keyboard. 


Control. This module invokes the other modules to edit a file. 


At first sight, this modularization might appear ideal. Each module is relatively 
small and has a well-defined interface to the rest of the system; in principle, each 
module can be programmed by a single programmer. However, let us consider how 
the requirements for this program might change over time. The following changes 
appear quite reasonable: 


1. Adding a new command. 
2. Changing the command syntax. 


3. Dealing with documents that are larger than available memory. This requires 
changing the internal data structure used to hold documents. 


4. Changing the editor to run in batch rather than in interactive mode. 


The first change requires changes to the Keyboard module (which reads and parses 
commands) and the Command module, which applies commands. In addition, we 
will probably have to implement the new command from scratch, as although the 
code required to perform common operations has already been written, it is dis- 
tributed throughout other modules. (Recall that each command is in a separate 
module). The second change only requires modification of the keyboard module; 
however, this change is relatively complex, as the reading of characters and parsing 


6.3. Using Modular Decomposition 155 


1. The syntax of the user command language. 
2. The meaning of the various user commands. 


3. Operating system calls that the editor must generate to read commands and 
display text. 


4. Operating system calls that the editor must issue to read and write files. 


5. Data structures used by the editor to represent a document. 


Figure 6.1 Problematic design decisions in the line editor 


of commands are combined in that module. The third change requires modification 
of all but the Keyboard and Control modules. The final change requires modi- 
fication of all modules that perform input from the keyboard or output to the 
screen. Input is limited to the keyboard module; however, output is distributed 
throughout all modules. 

These examples show a deficiency of this modularization: Changes that appear 
minor in a conceptual sense (such as changing a single data structure) turn out 
to be difficult to implement. A second deficiency is that there is considerable 
duplication of functionality in the various modules. For example, the Move, Loader 
and Insert modules must all implement similar operations on the document data 
structure. 

A final deficiency is that none of the modules can readily be reused in other 
programs. For example, a colleague developing a system to keep track of lunch 
dates might wish to reuse our code. The problem of maintaining an ordered list 
of lunch dates is similar to that of maintaining an ordered list of editor lines. 
However, the code that our colleague would seek to reuse is scattered throughout 
several modules. 


6.3 Using Modular Decomposition 


We now apply modular decomposition to develop an alternative modularization 
of the line editor. This approach encourages us to think first about the structure 
of a problem rather than how the problem is to be solved. In particular, we make 
a list of problematic design decisions, as illustrated in Figure 6.1. All of these 
decisions are likely to change in the editor life cycle and hence should be hidden 
within modules. 

In addition, we consider what components of the editor may be reusable in 
other systems and whether pre-existing components can be employed. As noted 
previously, the component of the line editor that maintains the internal data struc- 
ture is likely to be useful in other programs. It is hence desirable to encapsulate 
this data structure and its associated operations in a single module. This has 


156 Chapter 6. Programming in the Large 


Editor 


\ 


Parser Document 


Input Output Disk 


Figure 6.2 Alternative Line Editor Modularization 


the additional benefit of satisfying point (5) in Figure 6.1. The parser used to 
recognize user commands is another potentially reusable component. This em- 
phasizes the benefits of making the parser table-driven so as to facilitate changes 
to command language syntax. 

These concerns lead us to identify a need for six modules: 


1. Input. This module obtains characters from the keyboard. 


2. Parser. This module translates characters typed at the keyboard into com- 
mands. 


3. Editor. This module obtains user commands and processes them with re- 
spect to the document. 


4. Output. This module displays editor output on a screen. 


5. Document. This module maintains the internal representation of the doc- 
ument. 


6. Disk. This module reads and writes files in the file system. 


The relationship between these modules is represented in Figure 6.2. An arrow 
linking two modules indicates that the first uses functions provided by the second. 
For example, the Parser module may request the Input module for characters typed 
at the keyboard. This diagram makes clear the dependencies between modules. 
For example, a change to the Parser module interface may require changes to 
both the Editor and Input modules. However, changes to the internal details of 
the parser are isolated from the rest of the system. 


6.3.1 Interface Specifications 


Interface specifications can now be developed for the various modules. This in- 
volves specifying the functions exported by each module and the functions that 
each module imports from other modules. A description of the linguistic support 


6.3. Using Modular Decomposition 157 


used to define module interfaces in Strand will be given later. For the present, 
it is sufficient to recognize that interfaces can be defined in terms of either mes- 
sages or process invocations. Both forms of interface are used in the following 
specifications. 

The Input Module. This module exports a single process definition chars(Cs). 
Cs is a stream to which other processes can append incomplete messages of the 
form get-char(C) and get-status(S). These represent requests for characters typed 
at the keyboard and keyboard status information respectively. Each variable C 
is assigned an integer representing an ASCII value; S is assigned the integer 1 to 
signify “alive” and 0 otherwise. 

The Output Module. This supports a request to display a line on the 
screen. It exports a single process definition display_In(L,D). L is a list of terms to 
be displayed and D is a variable that is to be assigned the value “done” to signal 
completion. 

The Parser Module. This supports a request for the next command typed by 
the user. It exports a single process definition, commands(Cs), where Cs is a stream 
to which other processes can append messages of the form get-comm(C). Each 
variable C is assigned a string or tuple representing a command; for brevity the 
possible commands are not listed here. The module imports the process definition 
chars from the Input module. 

The Disk Module. This exports a single process definition disk(Ds). Ds is a 
stream to which other processes can append the following message types: 


open_file(F,Id) Open file F and return its integer identifier ld. 

read_line(|d,L) Attempt to read a line from file Id; return as L 
either a line or an end-of-file marker. 

write _line(Id,L) Write line L to file Id. 

close_file(Id) Close file Id. 


The Document Module. This maintains an internal work space representing 
the document being edited. It exports a single process definition document(Ds). 
Ds is a stream to which other processes can append the following message types: 


load_document(F) Load a document F into the internal work space. 
save_document(F) Save contents of internal workspace in file F. 
get_line(N,L) Return line number N as L. 

put_line(N,L) Store L as line number N. 

insert_line(N) Insert a blank line after line number N. 

delete _line(N) Delete line number N. 


Both file names and lines are represented as strings. This module imports the 
process definition disk from the Disk module. 

The Editor Module. This repeatedly obtains commands from the user, 
processes them with respect to the document, and displays a result on the screen. 
It imports the process definitions commands, document and display_In from the 
Parser, Document and Output modules respectively. 


158 Chapter 6. Programming in the Large 


Most module interfaces have been defined in terms of processes to which other 
modules send messages. It is important to understand that a definition in terms 
of process invocations would have been equally valid. For example, the Document 
module interface can also be defined to export the following process definitions: 


load_document(F,D) Construct D, a representation of the document F. 
save _document(F,D) Save document D in file F. 

get_line(D,N,L) Return line number N of document D as L. 
put_line(D,N,L) Store L as line number N of document D. 
insert_line(D,N,D1) Insert a line after line number N of document D. 
delete_line(D,N,D1) Delete line number N of document D. 


These process definitions encode operations on a data structure D representing 
a document. Both module interfaces provide the same functionality. However, 
although the second hides the operations on the document data structure, it does 
not hide the structure itself. In general, a module interface is best implemented 
using messages when the definition of a data structure must be hidden. 

After all interface specifications have been defined, the system specification is 
complete. Each module can then be implemented independently using stepwise 
refinement. This process begins with the interface specification and incremen- 
tally refines it to yield actual code. Refinement of different modules may proceed 
independently. 


6.3.2 Evaluation of the Second Modularization 


Let us now consider how easily the modifications applied to the first design can 
be applied to this new design. 


1. Adding a new command requires modification of the Parser module (to recog- 
nize the new syntax) and the Editor module (to execute the command). The 
new command can be coded easily as the document management functions 
required to implement commands are localized in the Document module. 


2. Changing command syntax only requires modification of the Parser module. 


3. An alternative document data structure only requires modification of the 
Document module; this is the only module that has access to this structure. 


4. An editor that runs in batch rather than in interactive mode can be con- 
structed by modifying just the Input and Output modules: These must read 
from and write to files rather than the keyboard and screen. 


Consider also what components the second line editor provides that can be reused 
in other programs. First, it provides a module capable of managing an in-memory 
text file (Document). Second, it provides a line-based file I/O module (Disk). 
Third, it provides a command parser. Fourth, it provides keyboard input and 


6.4. Linguistic Support for Modularization 159 


screen output facilities. Each of these modules is self-contained provided that the 
interface protocols are maintained. 

Deciding on an appropriate set of modules is not always easy. The following 
rules can aid in this choice. 


. A module should do one thing well. 
. Every module should hide something. 


. Only simple data structures should be passed across interfaces. 


> Ww N KF 


. Data organization (i.e., data structures and associated operations) should 
be hidden inside modules. 


5. System-dependent features (such as byte ordering or the number of comput- 
ers) should be hidden. 


The second line editor design illustrates the application of these rules. Each mod- 
ule defined in this design performs a single function or related group of functions 
and every module hides something. The Input, Output and Disk modules hide the 
potentially system-dependent mechanisms used to obtain characters from the key- 
board, display characters on the screen and access disk files. The Parser module 
hides user command syntax. The Editor module hides the definitions of user com- 
mands. The Document module hides the internal representation of documents. 
Finally, intermodule interfaces are simple: Only strings and numbers are passed 
between modules. 


6.4 Linguistic Support for Modularization 


Modularization is a design methodology: It does not imply a need for specific 
support in a language. However, linguistic support can reduce the risk of pro- 
gramming error by enforcing the information-hiding aspects of modules. Strand 
support for modularization consists of a syntax for specifying exported processes 
and a mechanism for invoking processes imported from other modules. In this 
section two of the line editor modules are implemented to illustrate the language 
features. 

In Strand, the term “module” is used to refer to a set of process definitions plus 
an exports statement that names certain definitions as visible to other modules. 
For example, Program 6.1 presents a partial implementation of the Output module 
defined in Section 6.3.1. Recall that the interface definition for this module speci- 
fies that a single process definition is exported display-_In/2. The module contains a 
variety of process definitions used to implement this process. An exports statement 
is used to specify that the process display-_In/2 is accessible to other modules. The 
module design was developed by a stepwise refinement of the interface definition. 

A module imports a process definition exported by another module by making 
an explicit intermodule call. This initiates execution of a process defined in 
another module. An intermodule call has the form: 


160 Chapter 6. Programming in the Large 


-exports({display _In/2]). 
display_In(L,D) :— display_line(L,done,D). 
display_line({T|L],D1,D3) :— 
display_term(T,D1,D2), 
display _line(L,D2,D3). 
display _line({],D1,D2) :— new_line(D1,D2). 
display_term(T,done,D2) :—... 


new-_line(done,D2) :—... 


Program 6.1: Output Module 


Name:Process 


and causes Process to be executed using a module named Name. For example, if 
the Output module is named output, then the following intermodule call initiates 
execution of the display_In process. 


output:display_In(“hello world”,D) 
This causes “hello world” to be displayed on the screen and then assigns a value 


to the variable D. 


e A Strand module is a set of 
process definitions plus an 
exports statement. 


e A module can import process 
definitions exported by other 
modules using the call (:) 
operator. 





6.4.1 The Editor Module 


Program 6.2 is a partial implementation of the Editor module defined in Sec- 
tion 6.3. The exports statement declares edit/1 as the only process definition visi- 
ble to other modules. The module imports functions from the Parser, Document 
and Output modules. 


6.4. Linguistic Support for Modularization 161 


-exports({edit/1]). 


edit(F) :— % R1 
parser:commands(Ks), 
document:document([load-document(F)|Ds]), 
process_commands(F,Ks,Ds,done). 


process_.commands(F,Ks,Ds,D) :— % R2 
data(D) | 
Ks := [get-comm(K)|Ks1], 
process.commands1(F,Ks1,Ds,k). 


process_commands1(F,Ks,Ds,insert(N)) :— % R3 
Ds := [insert_line(N)|Ds1], 
output:display _In((N],D), 
process.commands(F,Ks,Ds1,D). 
process_commands1(F,Ks,Ds,move(M,N)) :— % RA 
Ds := [get-line(M,L), 
delete_line(M), 
insert_line(N), 
put_line(N,L)|/Ds1], 
output:display_In((N,L],D), 
process.commands(F,Ks,Ds1,D). 


process_commands1(F,Ks,Ds,quit) :— % R5 
output:display-In([quit],-), 
Ks :=[], 
Ds := [save_.document(F)]. 


Program 6.2: The Editor Module. 


162 Chapter 6. Programming in the Large 


The editor initially uses the call operator (:) to invoke processes defined in the 
Parser and Document modules (R1). These processes will parse incoming com- 
mands and manage a document. An initial message of the form load_document(F) is 
sent to the document process to request it to load the file F (R1). Subsequent rules 
generate messages to the document and commands processes. These messages re- 
quest commands (R2) and changes to the document (R3-5). Explicit intermodule 
calls to the Output module are used to display lines on the screen (R3-5). 


6.5 Advanced Programming with Modules 


Strand’s call operator (:) provides a simple intermodule call mechanism that is 
adequate for most purposes. However, some programmers may require more ex- 
plicit control over code mapping or may wish to implement alternative types of 
intermodule call. Strand provides two low-level features to support these activ- 
ities: Code is a first-class data object and processes can be explicitly executed 
using specified code. The rest of this chapter is concerned with the use of these 
two aspects of the language. 

A Strand module has an external and internal representation. The external 
representation is program source code; the internal representation is an executable 
program that can also be manipulated as a data object. Compilation translates 
a module such as Program 6.1 into a Strand data type called a code module. 
Code modules can be included in data structures, passed between processes and 
tested for equality. Strand provides a predefined process that permits programs 
to execute processes using a specific code module. A call to this process has the 
form: 


run(Module, Process) 


where Module is a code module and Process is a process to be executed. For 
example, if M is a code module representing the compiled form of the Output 
module (Program 6.1), then the process: 


run(M,display_In(“hello world”,D)) 


initiates execution of the process display_In(“hello world”,D). 

The run process is powerful but unwieldy to use because it requires that all 
modules to be called in a program be passed as arguments to that program. For 
example, the following rule from Program 6.2: 


edit(F) :— 
parser:commands(Ks), 
document:document([load_document(F)|Ds)), 
process_.commands(F,Ks,Ds,done). 


could be written using the run process as: 


6.5. Advanced Programming with Modules 163 


edit(F,P,D,O) :— 
run(P,commands(Ks)), 
run(D,document([load.document(F)|Ds})), 
process_commands(F,Ks,Ds,done,O). 


The Parser, Document and Output modules must be passed as arguments (P, 
D, O) to the edit process. Note how the output module O is passed on to the 
process_commands process to permit subsequent intermodule calls. 

Due to this disadvantage, the run process is used primarily for implementing 
higher-level intermodule call constructs. To illustrate this, we describe two ap- 
proaches to the implementation of the call operator (:). Both are based on source- 
to-source transformation; the reader is referred to Chapter 9 for a discussion of 
how to implement program transformations in Strand. 

The call operator permits programs to refer to modules by name rather than as 
code modules. The implementations presented here translate references to named 
modules into references to code modules and explicit calls to the run process. 


6.5.1 Run-Time Resolution of Names 


The first approach associates a process called a module server with a set of pro- 
cesses that make intermodule calls. The processes are each given a stream to 
the module server. This stream is used to send messages to the module server 
requesting it to execute processes located in other modules. The source-to-source 
transformation applied to programs that use call statements can be illustrated 
using a simple example. Consider the following rules from Program 6.2. 


-exports((edit/1]). 

edit(F) :— 
parser:commands(Ks), 
document:document([load_document(F)|Ds)), 
process_commands(F,Ks,Ds,done). 

process_.commands(F,Ks,Ds,D) :—... 

These rules are translated to: 

-exports([edit/2]). 

edit(F,Ms) :— 
Ms := [parser:commands(Ks), 

document:document([load_document(F)|Ds])|Ms1], 


process_commands(F,Ks,Ds,done,Ms1). 


process_commands(F,Ks,Ds,D,Ms) :—... 


164 Chapter 6. Programming in the Large 


Each process in the Editor module is augmented with a request stream Ms. In 
addition, intermodule calls such as: 


parser:commands(Ks) 
are translated into messages on this stream, as follows: 
Ms := [parser:commands(Ks)|. . . ] 


Execution of the transformed program hence generates a stream of request mes- 
sages with the form ModuleName:Process. 

Program 6.3 defines a module server process that maintains a data structure 
containing code modules. The server takes three arguments: a stream of inter- 
module calls (Rs), a dictionary of {Name,Module} pairs, and an error stream (Es) 
on which calls to unknown modules are reported. 


server([N:P|Rs],Modules,Es) :— % R1 
lookup(N,Modules,Mog), 
try(P,Mod,Es,Es1), 
server(Rs,Modules,Es1). 


server([],.,Es) :— Es := []. % R2 
try(Process,Mod,Es,Es1) :— % R3 
module(Mod) | 
run(Mod,Process), Es := Es1. 
try(Process,not-found,Es,Es1) :— % R4 
Es := [call_error(Process)|Es1]. 
lookup(Name,[{Name,Module}|-_],Mod) :— % R5 
Mod := Module. 
lookup(Name,[{Name1,_}|Modules],Mod) :— 7% R6 
Name =\= Name1 | lookup(Name,Modules,Mod). 
lookup(_,[],Mod) :— Mod := not_found. % R7 


Program 6.3 A Module Server. 


Each time the module server receives a message representing an intermodule 
call, it searches the dictionary for the named module (R1,5-7). If the module is 
found, the run process is used to initiate execution of the given process using the 
stored module (R3). Otherwise, an error is signaled on the error stream (R4). 
Note the use of the guard test module (R3); this succeeds if its argument is a code 
module. 


6.5. Advanced Programming with Modules 165 


6.5.2 Load-Time Resolution of Names 


The implementation scheme presented in the previous section resolves module 
names at run-time. This is simple and flexible. However, the overhead of gener- 
ating a message and searching the module list is incurred on every intermodule 
call. 

This overhead can be avoided if, as in the editor example, module names are 
known at compile time. A tuple containing the appropriate code modules can 
be passed as an argument to processes at load-time, i.e., when a computation 
is initiated. These processes can then use the predefined process get-arg to ob- 
tain particular tuple arguments: This permits them to access modules without 
communication and searching overhead. For example, the intermodule call: 


parser:commands(Ks) 
can be translated to the following predefined processes: 


get_arg(1,Modules,Mod), 
run(Mod,commands(Ks)) 


The tuple Modules is assumed to contain the Parser module in its first argument; 
the run process is used to initiate direct execution of this module. This transfor- 
mation yields the following Editor module. 


-exports([edit/2]). 


edit(F,Ms) :— 
get_arg(1,Modules,CMod), 
run(CMod,commands(Ks)), 
get_arg(2,Modules,RMod), 
run(RMod,document({load_document(F)|Ds])), 
process_commands(F,Ks,Ds,done,Ms). 


process_.commands(F,Ks,Ds,D,Ms) :—... 


The tuple Ms is accessed to retrieve the Parser and Document modules. It is 
also passed to the process process_commands; this allows that process to import 
processes from the Output module. 


6.5.3 Compile-Time Resolution of Names 


A final approach to implementation of intermodule calls avoids all overhead by 
copying code at compile-time. For example, the contents of the Output module 
can be copied into, and compiled with, the Editor module. This approach requires 
that processes be renamed to avoid conflicts when two modules contain processes 
with the same name. A disadvantage of the approach is that it prevents modules 
from being compiled independently. For example, the entire Editor module must 
be recompiled if the Output module is modified. 


166 Chapter 6. Programming in the Large 


6.6 Code Mapping 


Since a code module is a valid data type, it can be passed between processes 
located on different computers. This permits the programmer to implement spe- 
cialized code-mapping strategies for particular applications. To illustrate how this 
is achieved we present, in Program 6.4, the implementation of a simple load-on- 
demand mechanism. This consists of a modified module server with a stream 
(MRs) to a host process. The module server requests code modules from the host 
process if they are not in a local dictionary. 


server(Rs,MRs,Es) :— server(Rs,[],MRs,Es). % R1 


server([Name:Process|Rs],Modules,MRs,Es) :— % R2 
lookup(Name,Modules,Mod), 
server1(Rs,Modules,MRs,Es,Name,Process,Mod). 


server([],.,.MRs,Es) :— MRS := [], Es := []. % R3 
server1(Rs,Modules,MRs,Es,_,Process,Mod) :— % R4 
module(Mod) | 


run(Mod,Process), 

server(Rs,Modules,MRs,Es). 
server1(Rs,Modules,MRs,Es,Name,Process,not_found) :— % R5 

MRs := [load(Name,Mod)|MRs1], 

server2(Rs,Modules,MRs1,Es,Name,Process, Mod). 


server2(Rs,Modules,MRs,Es,Name,Process,Mod) :— % R6 
module(Mod) | 
run(Mod,Process), 
server(Rs,[{Name,Mod}|Modules],MRs,Es). 
server2(Rs,Modules,MRs,Es,Name,Process,not_found) :— % R7 
Es := [load_error(Name,Process)|Es1], 
server(Rs,Modules,MRs,Es1). 


Program 6.4: Module Server with Code Mapping. 


The code dictionary is initially empty (R1). The server searches this dictionary 
when it receives an intermodule call (R2). If the named module is not found, a 
message load(Name,Mod) is generated on the module request stream (R5). This 
requests the named module. If a module is returned in response to this request, 
the pending process is executed and the module is added to the server’s dictionary 
(R6). Otherwise, an error is signaled (R7). 

Load requests generated by this program can be processed in a variety of ways. 
Typically, they will be passed to a host computer which will either access a cache 
of frequently requested modules or load the required module from disk. 


6.7. Summary 167 
6.7 Summary 


This chapter has been concerned with the particular problems that are encountered 
when developing large systems. It has introduced a program development strat- 
egy that employs both modular decomposition and stepwise refinement. Modular 
decomposition is used to specify a set of modules that encapsulate changeable and 
common aspects of a program. Individual modules are then developed by stepwise 
refinement of their interface specifications. 

The chapter has also introduced linguistic support for modular programming. 
This consists of the exports statement, used to specify which processes are visible 
outside a module, and an intermodule call operator, which allows modules to im- 
port processes defined in other modules. 


Exercises 


6.1 Apply modular decomposition in the design of a program that emulates 
an electronic calculator. That is, identify important modules and specify 
interfaces for these modules. 


6.2 Implement the calculator designed in Exercise 6.1. Then extend it to provide 
further functions: for example, trigonometric functions as well as arithmetic. 
How effective was your modularization? 


6.3 Implement a simple line editor using the modularization described in Sec- 
tion 6.3. Use a simple in-memory representation for documents and the 
Strand I/O facilities. 


6.4 Modify the line editor described in Exercise 6.3 so that it can edit files that 
are too large to hold in memory. A sensible organization stores lines from 
the document being edited in a scratch file on disk. A separate, in-memory 
index records the location of each line in the scratch-file. Utilize a module 
that provides scratch-file management functions. 


6.5 Implement the following distributed code management scheme. N loader 
processes are connected to each other via streams. Each process: 


1. Receives messages via some stream to request that code modules are 
loaded. 


2. Caches modules. 


3. Requests modules not already loaded from other loader processes. If 
no loader has the code, then it is requested from a host process. 





Chapter 7 


Integrating Existing Code 


A large body of sequential codes has been developed over the years in Fortran, C, 
Cobol and other languages. This represents a massive investment in both money 
and expertise that cannot be disregarded when we move to parallel computers. It 
will rarely be feasible to reprogram large applications in parallel languages. Nor 
is it necessarily desirable: Clever people have expended a great deal of energy 
making these programs work well, albeit only on single computers. 

Fortunately, an alternative to reprogramming exists. This chapter describes 
a multi-lingual approach that permits existing sequential codes to be integrated 
into Strand programs. This allows the codes to be executed on parallel computers 
without extensive reprogramming. 

In general, parallel programming requires that a problem be partitioned into 
subproblems that can be executed concurrently. Partitioning is usually achieved 
by decomposing either the problem structure or its data domain. In principle, 
code can be reused if each subproblem can be implemented by some component 
of an ezisting program. The decomposition into existing parts can be difficult if 
a program does not have a modular structure. However, it is our experience that 
programs developed using the methodologies described in previous chapters are 
easy to decompose. 


A more troubling aspect of reusing code is managing its parallel execution. 
The programmer must be able to specify how subproblems are distributed to 
computers and how program components communicate and synchronize. This is 
where Strand can be a useful tool: It provides an interface to other languages 
that permits sequential code and associated data structures to be encapsulated in 
Strand processes. These processes can perform the necessary communication and 
synchronization. In addition, they can readily be distributed, using techniques to 
be described in Chapter 8. This yields multi-lingual programs where the basic 
structure is expressed in Strand and low-level details are implemented by pre- 
existing code. The systems that result can be viewed as heavyweight processes 


169 


170 Chapter 7. Integrating Existing Code 





Figure 7.1 Integrating sequential code. 


floating in a sea of lightweight Strand processes: Data passes between the heavy- 
weight processes via streams. Figure 7.1 illustrates this organization: A database 
management system, written in Cobol, interacts with other processes that utilize 
Fortran and C code. 


Reasons for multi-lingual 
programming: 


e Reuse sequential code on 


multiprocessors. 
e Access specialized hardware. 
e Extend Strand. 
e Build convenient interfaces. 





Reuse of existing code is not the only reason for multi-lingual programming. 
Strand’s interface to other languages can be used to access specialized hardware 
such as vector processors. It can also be used to extend Strand with additional 
data structures and associated operations. These operations can support graph- 
ics facilities, mathematical libraries, databases, etc. Finally, many older software 
packages use clumsy parameter-driven user interfaces; Strand can be used to im- 
plement more convenient interfaces to these packages. 

Although integration is often cost-effective, it inevitably introduces additional 
complexity for the programmer. In consequence, existing sequential programs 
should only be reused if rewriting in Strand would require a substantial investment. 
Similarly, Strand processes should only be reimplemented in another language if 
the result will be significantly more efficient. 


7.1. Designing the Interface 171 


Only integrate if the result is a 
tangible saving in development 


time or a markedly more efficient 
program. 





As an example of a case where a multi-lingual approach is not appropriate, con- 
sider a Strand program that frequently searches lists. A programmer might be 
tempted to speed up this program by coding a “list search” operation in a low- 
level language. This solution is idiotic: Traversing lists will require linear time 
in any language. In contrast, trees can be searched in O(LogN) time and a hash 
table (implemented using a tuple) can for the most part be accessed in constant 
time. 


Multi-lingual programming is 


not a substitute for good 
algorithms. 





7.1 Designing the Interface 


The design of a multi-lingual program has both a top-down and a bottom-up 
component. Stepwise refinement provides a top-down structure; pre-existing code 
imposes the bottom-up component. Program refinement proceeds until it produces 
both a Strand program and a set of specifications. These specifications define func- 
tions that are implemented in other languages. The integration of these functions 
into Strand programs requires the definition of new data types and operations. 

A data structure implemented in another language, e.g. a Fortran vector, can 
be incorporated into Strand as a user-defined data type. A user-defined data 
type encapsulates an arbitrary byte sequence and can usefully be thought of as a 
black box. Strand programs can include these boxes in data structures and pass 
them between computers; however, they cannot look inside them. Black boxes can 
only be created and accessed by user-defined operations. These operations are 
not part of the Strand system. They invoke code written in other languages to 


172 Chapter 7. Integrating Existing Code 


perform tests and transformations on the data. For example, a user-defined guard 
test might verify that a data type is a vector. A user-defined body process might 
invoke a vector processor to multiply two vectors and produce a third. 


Support for integration: 


e User-defined data type 
encapsulates arbitrary data. 


e User-defined operations invoke 
code in other languages. 





The interface between Strand and other components of a multi-lingual program 
will hence be defined in terms of one or more data structures plus a set of user- 
defined operations. The operations generally fall into four groups: 


1. Tests. These verify that user-defined data satisfies certain criteria. 


2. Conversions. These convert between user-defined and Strand representa- 
tions of data. 


3. I/O. These input or output user-defined data. 


4. Computation. These perform operations on user-defined and/or Strand 
data. 7 


The operations in the fourth group do the actual work; all other operations are 
concerned with interfacing. 

Let us now consider how to utilize a vector processor in more detail. Assume 
that the processor in question expects its data to be provided as a linear array. 
To avoid the overhead of converting between Strand data types (such as lists) and 
linear arrays each time the vector processor is invoked, a VECTOR data type is 
defined; this contains a vector of floating point numbers. The following operations 
can be defined on the vector data type. 


Operation Guard/Body Description 


vector(V) guard test Succeed if V is a vector, otherwise fail. 
inner_prod(X,Y,Z) body process Z is the inner product of vectors X and Y. 
vector-size(V,S) body process V is a vector and S is its size. 
vector-to-list(V,L) body process L is a list of reals in vector V. 
list-to-vector(L,V) body process V is a vector of elements in the list L. 


The first operation is a user-defined guard test; it permits Strand programs to 
distinguish vectors from other data types. The other operations are user-defined 


7.2. Implementing the Interface 173 


body processes. They invoke the vector processor or perform various transfor- 
mations on the data. Using these primitives, it is now possible to write Strand 
programs that use the vector processor. For example, Program 7.1 computes the 
matrix multiplication C = A x B7}. A is represented by a list of row vectors; the 
transposed matrix B is represented by a list of column vectors. The program ap- 
plies the inner- prod process to multiply each pair of vectors (R3). The list-to-vector 
process is used to generate a new column vector from a list of result values (R1). 


mm([A|As],Bs,Cs) :— % R1 
row(A,Bs,Rs), 
list_to_vector(Rs,C), 
Cs := [C|Cs1], 
mm(As,Bs,Cs1). 
mm([],..Cs) :— Cs := []. % R2 


row(A,[B|Bs],Rs) :— % R3 
inner-prod(A,B,R), 
Rs := [R|Rs1], 
row(A,Bs,Rs1). 

row(-,[],Rs) :— Rs := []. % RA 


Program 7.1 Matrix Multiplication 


7.2 Implementing the Interface 


Assume that a set of operations and their interface to Strand has been defined. 
Assume also that routines that perform these basic operations already exist. The 
problem that remains is to construct an interface between the existing code and 
Strand. For example, in the previous section we specified an inner_prod process 
and may have a Fortran implementation of this function available. Our problem 
is to provide code that permits calls to inner_prod in Strand programs to invoke 
the Fortran routine. The interface code must perform three functions: 


1. Suspend until required inputs are available. 
2. Invoke existing code to perform computation. 


3. Succeed or fail in the case of a guard test; generate output in the case of a 
body process. 


Strand systems provide interface builders and macro packages that avoid most of 
the labor associated with these activities. In fact, the programmer frequently only 
need provide a statement declaring the name of the user-defined operation and 
a description of its arguments. These tools are system and language dependent, 


174 Chapter 7. Integrating Existing Code 


rather than discuss a particular implementation we focus here on illustrating the 
concepts used to define interfaces. We assume a typical interface tool kit; readers 
should refer to their system documentation for the available tools. 

Let us examine how the vector manipulation operations introduced in Sec- 
tion 7.1 are implemented using these concepts. The guard test vector is expressed 
using a single declaration: 


101 guard vector <> (VECTOR?) — 


This statement specifies a unique key (101), the type of the operation (guard), 
the name used to invoke the operation in Strand programs (vector), the name 
of the routine that implements the operation (<> indicates that no routine is 
to be called), a description of the operation’s arguments (VECTOR?) and the 
implementation language (no language). VECTOR is a symbolic constant used to 
identify the vector data type and is declared to the interface builder. 

An argument description specifies whether an argument is used for input (7) 
or output (^). It can also indicate the argument type. In the vector example, 
the argument description causes the interface builder to generate code to perform 
two actions. First, synchronization code causes the test to suspend if its input 
argument is not available. Second, type-checking code causes the test to fail if 
the argument is not a user-defined data type VECTOR. This provides a complete 
implementation of the vector test; no other routines need be called. 

A more complex interface is required for the inner_prod process. This interface 
consists of both a declaration and procedure. The declaration has the form: 


102 body inner_prod c_ip (VECTOR?,VECTOR?,real*) C 


This states that inner_prod has identifier 102, is a body process and calls the C 
procedure cip. It also states that the process expects two input arguments, both 
vectors, and generates a real as output. 

The C procedure c_ip must be provided to implement the rest of the interface. 
As arguments are passed using pointers (by reference), the C procedure receives 
two pointers to user-defined data types and one pointer to a real data type. The 
procedure uses two pre-defined macros, Size and Data, to access the size and data 
components of a user-defined data type and invokes the vector processor using 
vp.ip. The value returned by vp-_ip is returned as the result of the computation. 


c_ip(X, Y,Z) 
USERPTR X,Y; 
REALPTR Z; 


{ 
*Z = vp_ip(Size(X), Data(X),Data(Y)); 


In summary, the inner_prod process is implemented in two stages: The decla- 
ration statement instructs the interface builder to generate code that checks that 


7.3. Six Design Principles 175 


the two input arguments are available and are vectors. It also generates code to 
convert the result generated by vp_ip into a Strand real. The C procedure c_ip 
accesses input argument components and invokes the vector processor. 


7.3 Six Design Principles 


Macros and definition statements reduce the risk of error when implementing in- 
terfaces. Programmers are nevertheless responsible for using these tools correctly. 
Practical experience indicates that adherence to the following principles signifi- 
cantly reduces the risk of error: 


Pl 


P2 


P3 


P4 


P5 


P6 


Obey strict directionality. The arguments of user-defined operations 
should be used for either input or output, not both. Thus, operations should 
never assign values to variables contained in input arguments: They should 
only produce values on output arguments. 


Provide error arguments. Errors in body process execution should be 
signaled by binding an output argument. In contrast, guard tests simply fail 
if their arguments are erroneous. 


Use types. User-defined data types should be used in meaningful ways. 


Accomplish effects. An operation should either completely accomplish 
its desired effect or do nothing. It should not terminate leaving open files, 
unfreed memory, etc. 


Do not store pointers. User-defined data types should not contain point- 
ers. This permits Strand to relocate data during garbage collection or when 
migrating data between computers. 


Obey the single-assignment rule. Recall that this rule states that a 
variable cannot be modified once it has been assigned a value. This permits 
non-variable data to be copied between computers without concern for the 
consistency of the copies. User operations should not therefore modify data 
except by binding variables. 


Examples in subsequent sections illustrate the application of these principles 
which we should generally seek to obey. However, we shall demonstrate that 
it is reasonable and valuable to violate principles P4, P5 and P6 under certain 
controlled circumstances. 


176 Chapter 7. Integrating Existing Code 


Principles for integration: 


P1. Obey strict directionality. 

P2. Provide error arguments. 

P3. Use types. 

P4. Accomplish effects. 

P5. Do not store pointers. 

P6. Obey the single-assignment rule. 





7.3.1 Relaxing the Single-Assignment Rule (P6) 


In order to obey the single-assignment rule, subsystems implemented in other 
languages must construct new copies of data items that are to be modified. For 
example, consider a database system, supported using a user-defined data type 
DATABASE plus the following operations. 


empty-database(Database) 
lookup(Key, Value, Database) 
modify(Key, Value, Database, NewDatabase) 


The operations initialize a database and allow it to be accessed or modified. An 
implementation of the modify process that obeys the single-assignment rule would 
have to construct a new copy (NewDatabase) of the database to which it is applied 
(Database) to change a single value. This is not feasible if the database is large and 
frequently modified. Hence, it is necessary to relax the single-assignment rule and 
allow the data structure representing the database to be modified. This violates 
principle P6. 

Data structures of this type can be used safely provided that we ensure they are 
not shared between processes. Thus, programs employing user-defined operations 
behave in the same way even if the single-assignment rule is not obeyed. Sharing 
can be prevented by applying two simple rules: 


Encapsulate Mutable Structures. A mutable data structure should 
be encapsulated within a process to which other processes send mes- 
sages. 


Sequence Accesses. All operations on mutable data performed within 
the encapsulating process must be strictly sequenced. 


To understand the reason for the second rule, consider the program: 


7.3. Six Design Principles 17? 


database(Rs) :— 
empty_database(Db), 
database(Rs, Db). 


database([read(Key,Value)|Rs],Db) :— 
lookup(Key, Value,Db), 
database(Rs,Db). 

database([write(Key,Value)|Rs],Db) :— 
modify(Key, Value,Db, NewDb), 
database(Rs,NewDb). 


This program uses the user-defined operations introduced earlier to process read 
and write requests on a database. The read request returns the value associated 
with a key from the database. If the single-assignment rule is obeyed, this value is 
always that provided by the last write request using the same key. For example, 
the following process always assigns the value 33 to the variable X. 


database([write(john,32) write(john,33),read(john,X)]) 


However, if the single-assignment rule is not obeyed, then the value returned by 
a read request is dependent on the order in which lookup and modify processes 
are executed. The example process could hence assign either 32 or 33 to X. This 
time-dependency is undesirable but can be avoided by sequencing operations on 
the database. This is achieved using data-flow synchronization: 


database(Rs) :— 
empty_database(Db), 
database(Rs,Db,go). 


database([read(Key,Value)|Rs],Db,Synch) :— 
data(Synch) | 
lookup(Key, Value, Db), 
database(Rs,Db, Value). 
database((write(Key,Value)|Rs],Db,Synch) :— 
data(Synch) | 
modify(Key, Value,Db,NewDb), 
database(Rs,NewDb,NewDb). 


Note that the modify process, that modifies the database, simply returns the 
old database, updated. The explicit return of the data structure permits other 
processes to synchronize with the completion of the update. Note also how the 
value output by the lookup and modify processes is passed to the recursive database 
process. Thus, the data tests ensure that no further messages are serviced until 
the previous database access is complete. The order of accesses to the database 
hence corresponds to the order in which messages are received. In consequence, 
the program behaves in the same manner even if the single-assignment rule is not 
obeyed. Thus, the modify process can safely update the existing database and 
return it as the new database. 


178 Chapter 7. Integrating Existing Code 


7.3.2 Allowing Unaccomplished Effects (P4) 


Principle P4 states that a user operation should close any files it has opened, free 
any memory it has allocated, etc. Unfortunately, this is not always feasible. For 
example, it is clear that a process defined to create a window, to which other 
processes must subsequently write, cannot be expected to delete that window 
before it completes. Similarly, a process that initializes an in-core database may 
need to request memory from the operating system that it does not deallocate. 

These and similar violations of principle P4 are acceptable if effects can be 
accomplished at a global level: That is, if programs invoking these processes guar- 
antee to eventually accomplish all effects. For example, a program may guarantee 
to close all windows that it has opened and to deallocate all memory allocated for 
databases. As these actions must be performed even if the program terminates 
abnormally, careful programming is required. A useful technique is to establish 
a Manager with which other processes register unaccomplished effects to be dealt 
with upon program termination. 


7.3.3 Storing Pointers (P5) 


Principle P5 forbids pointers (absolute addresses) in user-defined data structures, 
as these prevent the structures from being relocated. In practice, two types of 
stored pointer can be distinguished: 


1. Pointers to Strand data. For example, a user-defined data type that includes 
pointers to itself. 


2. Pointers to data created outside Strand using some feature of the underlying 
operating system. For example, a file handle returned by a system call. 


The first use can be circumvented by using relative offsets. The second use is 
acceptable if care is taken to ensure that this data is never copied to another 
computer. Once again, this can be achieved by encapsulating the data structure 
within a process. 


7.4 A Database Management System 


We conclude this chapter by developing components of a multi-lingual program 
that provides Strand programs with an interface to a database management sys- 
tem. This system will be accessible, and perhaps familiar, to most readers. The 
ndbm library is included in many UNIX systems and provides a variety of database 
management functions. We define a set of user-defined operations that permit 
Strand programs to use these functions to access databases. 

This problem is interesting because an efficient interface requires that three of 
the principles stated in Section 7.3 be violated: effects cannot be accomplished, 
pointers must be stored and the single-assignment rule must be broken. The ndbm 
library allows a programmer to open a database and obtain a handle that can be 


7.4. A Database Management System 179 


used for subsequent accesses. The programmer uses this handle to perform a 
sequence of operations and eventually close the database. Opening and closing a 
database on each access would be expensive; thus the handle must be maintained 
between accesses. A Strand interface must therefore provide an operation that 
opens a database and returns the handle. This operation leaves the database 
open and therefore fails to accomplish effects. In addition, the handle is a pointer 
to a structure maintained by UNIX; hence, a user-defined data type contains a 
pointer. Finally, the database management functions invoked by Strand processes 
do not obey the single-assignment rule: They modify the contents of the database 
referenced by the handle. 

These violations are acceptable if handles are used in a controlled way. In 
particular, the following conditions must be satisfied by programs that use the 
database: 


1. Handles are not passed between computers. 

2. All open databases are closed before termination. 
3. Databases are not accessed before they are opened. 
4. Databases are not accessed after they are closed. 

5. Reads and writes to a database are sequenced. 


6. The same database is not opened more than once before being closed. 


Conditions 1, 2 and 5 avoid potential problems due to violation of principles P4, 
P5 and P6; the other conditions are imposed by the ndbm library. In the following 
discussion, we first define the interface to the database system; we then consider 
how to structure Strand programs that use this interface. 


7.4.1 The Interface 


A handle is represented in Strand with a user-defined data type DATABASE. Six 
user operations provide access to the ndbm library: 


database(Dbld): (Guard) Succeeds if Dbld is a DATABASE and fails oth- 
erwise. 


db_open(Name,Dbid,Result): (Body) Opens a database with a given 
Name, returns a handle Dbld and signals the result of this operation 
using Result. 


db_close(Dblid,Result): (Body) Closes the database with handle Dbld and 
signals a Result. 


db.fetch(Dbid,Key,Record,Result): (Body) Retrieves the Record associ- 
ated with Key in the database with handle Dbld and signals a Result. 


180 Chapter 7. Integrating Existing Code 


db.store(DbId,Key,Record,Result): (Body) Stores a {Key,Record} pair in 
the database with handle Dbld and signals a Result. 


db_delete(Dbid,Key,Result): (Body) Deletes a Key from the database 
with handle Dbld and signals a Result. 


Implementations of two of these operations are presented here. The db_open op- 
eration requires the definition statement: 


103 body db_open c_db-open (String?, DATABASE ^ ,String ~) C 


This statement signifies that the db_open operation is defined by a C procedure 
named c.db_open. It expects a string as an input argument and returns both a 
database and a string as output arguments. The C procedure is defined using 
pre-defined macros: 


c_db_open(Name,Db, Result) 
STRING Name; 
USERPTR Db; 
STRING Result; 
{ Handle h; 
h = dbm_open(Name); 
if (h == 0){ 
*Result = “open error”; 
return; 


BuildUser(Db, DATABASE, sizeof(Handle)); 
Data(Db) = h; 
*Result = “ok”; 


} 


The definition statement instructs the interface builder to generate code that en- 
sures the first argument (Name) is a Strand string. This is then converted to a 
C string. The C procedure can thus immediately attempt to open the named 
database by calling the library function dbm-open; this function returns a handle 
or the integer 0 to signal an error. If the database cannot be opened, the pro- 
cedure returns a result of “open error”. If the database is opened, a user-defined 
data type DATABASE is built using the BuildUser macro. The data part of the new 
data type is then filled with the handle using the Data macro. Finally, the return 
of a Result “ok” indicates successful completion of the procedure. 
The db-fetch operation is implemented in a similar way: 


104 body db_fetch c_db_fetch (DATABASE? string? string * string ^) C 


7.4. A Database Management System 181 


c_db_fetch(Db,Key,Record,Result) 
USERPTR Db; 
STRING Key, Record, Result; 
{ DbRec rec; 
rec = dbm_fetch(Data(Db),Key); 
if (rec.size =< 0){ 
*xResult = “not found”; 
return; 
} 
Record = rec.data; 
«Result = “ok”; 


} 


Note the use of the Data macro to access the data component of the user-defined 
data type containing the handle. The library function dbm-fetch is used to retrieve 
the record with a given key from the database. This record is returned as a Strand 
string. 

It is interesting to evaluate this interface with respect to the six principles 
introduced in Section 7.3. All six processes are strictly directional (P1). Those 
processes that can generate an error either have an explicit error argument or an 
output argument that can be used to signal errors (P2). User-defined data items 
representing handles are distinguished by a type DATABASE (P3). Only db_open 
does not accomplish effects (P4). All six operations make use of stored pointers in 
the form of a handle; hence they violate principle P5. The db-store and db-delete 
operations violate the single-assignment rule (P6) by modifying the database. 


7.4.2 A Database Monitor 


We can now provide Strand programs that use the operations to provide access to 
databases. Recall the six conditions that these programs must satisfy: 


1. Handles are not passed between computers. 

. All open databases are closed before termination. 

. Databases are not accessed before they are opened. 
. Databases are not accessed after they are closed. 


. Reads and writes to a database are sequenced. 


Oo on A CW N 


. The same database is not opened more than once before being closed. 


Conditions 1 and 3-5 can be satisfied by encapsulating each open database within 
a monitor (c.f., Section 5.1). Program 7.2 implements this monitor; for brevity, it 
does not check for, or report, errors. A process of the form: 


db(Name,Rs, Result) 


182 Chapter 7. Integrating Existing Code 


opens a database Name and creates a server; this responds to fetch, delete and 
store requests on a request stream (Rs). The variable Result is used to signal when 
the database is closed. 


db(Name,Rs,D) :— % R1 
db_open(Name,Dbid,R), 
dbm(Dbid,Rs,D,R). 


dbm(Dbid,Rs,D,R) :— % R2 
data(R) | dbm1(Dbid,Rs,D). 

dbm1 (Dbld,[fetch(Key,Rec,R)|Rs],D) :— % R3 
db_fetch(Dbld,Key,Rec,R), dom(Dbld,Rs,D,R). 

dbm1(Dbld,[delete(Key,R)|Rs],D) :— % R4 
db_delete(Dbld,Key,R), dom(Dbld,Rs,D,R). 

dbm1(Dbld,[store(Key,Rec,R)|Rs],D) :— % R5 
db-store(Dbld,Key,Rec,R), dom(Dbld,Rs,D,R). 

dbm1(Dbld,[],D) :— % R6 


db_close(Dbld,R), closed(R,D). 
closed(R,D) :— data(R) | D := closed. % R7 


Program 7.2 Database Monitor 


Initially, the program opens the database (R1) and subsequently services re- 
quests received on the request stream in order of arrival (R2-6). Sequencing is 
achieved by passing the result generated by a db_fetch, db_delete or db_store opera- 
tion to dbm1 (R3-5). The dbm process suspends until this result is available (R2). 
This is achieved using the predefined data test. The monitor closes the database 
and terminates when the request stream is closed (R6). The variable D is assigned 
the value closed when the close_db call has completed (R7). 

The encapsulation of a database in a monitor can be used to ensure that handles 
do not migrate between computers; only the monitor has access to a handle. The 
sequencing of accesses ensures that a database is not accessed before it is opened 
or after it is closed. 


7.4.3 A Database Manager 


The database monitor does not, by itself, address the remaining conditions. If 
any process can create a database monitor, it is possible for databases to remain 
open after the program that opened them terminates. It is also possible for the 
same database to be opened more than once. These problems can be solved by 
providing a manager that coordinates the actions of the monitors. A program 
that needs to access databases is required to do so in a disciplined way: It must 


7.4. A Database Management System 183 


request the manager for permission to open a database and signal when it closes 
the database. The manager implemented in Program 7.3 ensures that no database 
is opened more than once concurrently. 


manager(Rs) :— manager(Rs,| ]). % R1 


manager([open(Name,Result)|Rs],Dbs) :— % R2 
member(Name,Dbs,R), 
result(R,Name, Result, Dbs,NewDbs), 
manager(Rs,NewDbs). 

manager([close(Name)|Rs],Dbs) :— % R3 
delete(Name,Dbs,NewDbs), 
manager(Rs,NewDbs). 


result(true,_,Result,Dbs,NewDbs) :— 

Result := already_open, NewDbs := Dbs. 
result(false,Name,Result,Dbs,NewDbs) :— 

Result := permission_granted, NewDbs := [Name|Dbs]. 


Program 7.3 Database Manager. 


The manager maintains an initially empty list of currently open databases (R1). 
It uses this list to process open and close requests. An open request is allowed 
if the named database is not in the list (R2); a close request causes the named 
database to be removed from the list (R3). 

Any process that may need to access databases is given a stream to the man- 
ager. The process is required to execute code similar to that shown in Program 7.4 
if it needs to create a database monitor. 


attempt_open(Name,Rs,R,Ms) :— % R1 
Ms := [open(Name,CmR)|Ms1], 
attempt_open(Name,Rs,R,Ms1,CmR). 


attempt_open(_,_,R,Ms,already_open) :— % R2 
R := already_open, Ms := []. 
attempt_open(Name,Rs,R,Ms,permission_granted) :— % R3 


db(Name,Rs,R), 
close(Name,R,Ms). 


close(Name,closed,Ms) :— Ms := [close(Name)]. % R4 


Program 7.4: Opening a Monitor 


184 Chapter 7. Integrating Existing Code 


The attempt-open process generates an open request to the manager (R1); if 
permission to open the database is granted, a database monitor and a close process 
are created (R3). The close process subsequently generates a close message to 
signal when the database is closed (R4). 

The manager and attempt_open processes can be used to satisfy condition 2. 
Condition 6, which requires that databases are closed when the program that 
opened them terminates, can also be satisfied through disciplined programming. 


7.4.4 Discussion 


This section has shown how a familiar database system can be integrated into 
Strand. In practice, this type of task may be performed for a number of reasons. 
A Strand program may require an efficient database system; using an existing 
system saves the effort of writing one in Strand. Alternatively, an existing database 
package may require a more convenient interface, or may need to be used on a 
parallel computer. 

The design of the database subsystem presented here violates three of the 
principles presented in Section 7.3: It does not always accomplish effects, it stores 
pointers and it does not obey the single-assignment rule. In consequence, care 
must be taken when writing programs that use its facilities. A number of simple 
programming techniques that avoid common problems have been presented. Op- 
erations on mutable data structures are sequenced by encapsulating the structure 
in question within a process. A manager was introduced to coordinate the opening 
and closing of databases. 


7.5 Summary 


Parallel computers offer cost-effective computing power but present a serious engi- 
neering problem. What is to be done with existing sequential codes? It is certainly 
not feasible to throw them away or to rewrite them. Strand can provide a solu- 
tion to this problem by coordinating the execution of existing sequential codes on 
multicomputers. This is achieved by integrating these codes and their data into 
multi-lingual programs. 

The use of Strand as an integration language is supported by user-defined data 
types and operations. User-defined data types permit Strand programs to ma- 
nipulate data without concern for its internal structure. User-defined operations 
permit Strand programs to execute code written in other languages. This code 
typically, but not always, defines operations on user-defined data types. 


Exercises 


7.1 Study your Strand system documentation to determine precisely how pro- 
grams written in your favorite language are integrated. Use the available 


7.5. 


7.2 


7.3 


7.4 


7.5 


Summary 185 


tools to incorporate some simple mathematical functions into Strand as user 
operations, for example, sine and cosine. 


Implement the vector operations described in Section 7.2. In the absence of 
a vector processor, you will have to program the vector operations yourself. 


Construct an interface to a database system by implementing processes sim- 
ilar to those defined in Section 7.4. If your Strand system runs on a UNIX 
machine, you may choose to implement an interface to the databases dbm 
or ndbm. 


Extend the manager process given in Section 7.4.3 to deal with condition 
6. This can be achieved by associating a variable with the central manager 
and passing this variable to every database monitor created. The monitor is 
extended to close its database if this variable is assigned a value. 


Implement an interface to your local graphics library to provide simple line- 
drawing facilities. Write a Strand program that uses these facilities to draw 
a maze and make a mouse walk through the maze. This program should use 
the search techniques described in Chapter 5. 





Chapter 8 


Process Mapping 


Previous chapters have described a variety of concurrent programs. Although these 
programs can execute on parallel computers, the details of how this is achieved 
have not been examined. Utilizing parallel hardware requires that two central 
problems be overcome: partitioning the problem into concurrent components 
and allocating (or mapping) these components to computers. The partitioning 
problem is often approached by decomposing either a problem’s function or its 
data. We will illustrate both functional and data decompositions in the course of 
this chapter. The allocation problem is complicated if the structure of a problem 
is irregular or dependent on its data. In this case, a load balancing algorithm 
must be used to dynamically allocate units of work to computers. Strand does not 
solve these problems but rather provides simple tools to help in solving them. 

The tools provided are based on the concept of a virtual machine. Each 
virtual machine is composed of a collection of connected computing sites called 
nodes. Many Strand processes may execute at each node in a virtual machine. 
Tools are provided to specify where, within a machine, processes execute. 

Strand supports a variety of virtual machines that differ in the manner in which 
nodes are connected. The connection topologies are designed to be convenient 
programming structures and correspond to familiar organizations encountered in 
problem solving, e.g., ring, mesh, tree, etc. This approach has three advantages: 


Ease of Programming. Since virtual machines are based on con- 
venient problem-solving structures, they are easier to program than a 
particular hardware configuration. The choice of which virtual machine 
to use for an application is based solely on programming convenience. 


Device Independence. The same virtual machines are supported on 
a variety of architectures; thus, moving programs from one machine to 
another is relatively straightforward. 


Scaleability. Programs are scaleable in that they may be written to 
execute on many thousands of nodes. On a small parallel architecture 
many nodes are automatically packed into a single computer. Thus, 


18? 


188 Chapter 8. Process Mapping 


when additional computers are purchased, a program can utilize them 
automatically by reducing the packing density. 


The number and variety of virtual machines supported by the Strand system are 
likely to change over time. The commands required to allocate a virtual machine 
of a particular size and type are described in the system documentation. This 
also specifies how to execute a program on an allocated virtual machine. We 
will not deal with these system dependencies in this chapter. Instead, we concern 
ourselves with illustrating how to use two important virtual machines: the ring and 
torus. In addition, we will show how to define load balancing strategies using these 
concepts. Our emphasis throughout is on programming technique; no regard is 
paid to performance issues and tradeoffs. We strongly urge readers to experiment 
with the techniques and develop their own experience. 


8.1 Ring Mappings 


A ring virtual machine is a collection of nodes connected to form a ring. It is 
possible to write programs that spawn processes around the ring. This is achieved 
using a direction annotation that specifies the direction of spawning. An anno- 
tation of the form @fwd or @bwd spawns a process to the next node in a clockwise 
or counter-clockwise direction, respectively. For example, consider the rule: 


talk :— display(“Hello Worid”)@fwd. 


Figure 8.1 shows how this program operates: Each circle represents a node in the 
virtual machine and processes are shown in the node where they execute. 





Figure 8.1: A Ring Virtual Machine 


8.1. Ring Mappings 189 


Execution of the talk process at any node in the virtual machine causes a 
predefined display process to be spawned to a clockwise adjacent node. When the 
display process executes, the message “Hello World” is printed. 

With this simple tool, the world of parallel programming is at our command: 
We can at last write a concurrent simulation of N barking dogs. Program 8.1 
causes each dog to bark on a different node of a ring virtual machine. 


-machine(ring). % R1 


dogs(N) :— % R2 
N>0 | 
N1 is N — 1, 
dog(N, “Arf Arf!”), 
dogs(N1)@fwd. 
dogs(0). % R3 


dog(Dog,Woof) :— 
display({Dog,Woof}). 


Program 8.1: N Barking Dogs 


An initial declaration informs the Strand compiler that the program will be 
executed on a ring virtual machine (R1). The program is similar in structure to 
programs that have been encountered throughout this text. The only new aspect 
is the direction annotation in the recursive process: dogs(N1)@fwd (R2). This 
causes the process structure of the program to be distributed around a ring. At 
each step, a single dog is spawned on the current node and the process continues 
execution at the next node (R2). Thus, each dog barks loudly ({N,"Arf Arf!"}) from 
an independent node. 

Figure 8.2 shows the computation as it develops with barking dogs at each 
point in a six node ring. It assumes that the process pool at the starting node 
initially contains the process dogs(10). 

After ten dog processes have been spawned, recursion reaches a stopping condi- 
tion (R3); the dogs process then terminates. Notice that because N (=10) is larger 
than the number of nodes (=6) spawning wraps around the ring; hence, more than 
one dog barks from the same node. This wrapping around the ring allows the 
virtual machine to be treated as if it were an infinite line of nodes. This provides 
a convenient programming abstraction: Any program whose structure forms a line 
can be mapped to a ring. To illustrate the use of the direction annotation further, 
we now consider two more complex mappings: list intersection and merge sort. 


190 Chapter 8. Process Mapping 


dog(10....) 
dog(4....) 








dog(9, 
dog(5....) Non | 
dog(8.... 
dog(6.,...) doe. 





dog(7,...) 
dog(1,...) 


Figure 8.2: N Barking Dogs 


8.1.1 List Intersection 


Recall from Section 3.2. the problem of forming the intersection of two lists. The 
algorithm involves recursively inspecting each element of the first list to ascertain 
if it is a member of the second. The element is added to a list representing the 
intersection only if the test succeeds. 

In our specification, each element of the first list L1 is removed and a mem- 
bership test is performed. We can thus visualize the specification as a sequence 
of membership tests. A sequence is analogous to a line and lines map easily to 
rings. Thus, it is straightforward to functionally decompose the problem and map 
components to a ring. 


intersect(...) :— 
member(...), 
intersect(.. . )@fwd. 

intersect(...). 


At each step a member process is spawned and then recursion carries the process 
structure to the next node in the ring; thus, a member process is executed at each 
node. 

Program 8.2 shows the complete intersection program including process map- 
ping on a ring virtual machine (R1). The intersection of lists L1 and L2 is rep- 
resented by a difference list (Lb,Le). A member_add process is spawned for each 


8.1. Ring Mappings 191 


element X in the list L1 (R2). The @fwd annotation causes each member_add 
process to execute at a different node in the ring (R2). If X is a member of L2 
it is added to the intersection (R4); otherwise nothing is added (R6). Spawning 
terminates at the end of L1 (R3). 


-machine(ring). % R1 


intersect([X|L1],L2,Lb,Le) :— % R2 
member_add(X,L2,Lb,Lm), 
intersect(L1,L2,Lm,Le)@fwd. 


intersect({],-,Lb,Le) :— % R3 
Lb := Le. 

member_add(X,[X|_],Lb,Le) :— % R4 
Lb := [X|Le]. 

member_add(X,[X1|L2],Lb,Le) :— % R5 

X =\= X1 |member-add(X,L2,Lb,Le). 

member_add(_,[{],Lb,Le) :— % R6 

Lb := Le. 


Program 8.2: List Intersection on a Ring 


Recall that when describing difference lists we pointed out that the represen- 
tation allows a list to be constructed concurrently. Program 8.2 shows how a 
difference list can be incrementally constructed at independent computing sites. 
Each member_add process executes at a different site and generates some portion 
of the list. Thus, the representation is distributed across the nodes. However, it 
still corresponds to a list and can be manipulated by other processes. For example, 
the initial node may contain the processes: 


intersect([a,b,c,d,e],[x,y,a,b,z],L,[]), reverse(L,L1) 


This process pool uses the reverse process definition described in Section 2.6. 
Eventually, the value L1 will be the reverse of the distributed list L, i.e., [b,a]. 


8.1.2 Merge Sort 


Merge sort is a well-known algorithm that takes an unsorted list of numbers as 
input and generates a sorted list as output. Figure 8.3 illustrates the algorithm 
when sorting an eight element list. The unsorted input is segmented into ordered 
pairs of numbers and a sequence of merge stages is created. In each merge stage, 
adjacent pairs of sublists are merged into a larger ordered sublist. At the last 
merge stage two ordered sublists, each comprising half the input, are merged to 


192 Chapter 8. Process Mapping 


(5,3,7,1,6,8,2,4] <«— unordered output 
merge stages 
[3,5] [1,7] [6,8] [2,4] +—————_ initial pairing 


[1,3,5,7] [2,4,6,8] 





[1,2,3,4,5,6,7,8] 
Figure 8.3: Merge Sort Algorithm 


ordered output 


form the result. The merging of each sublist can be carried out in parallel subject 
only to data availability. 

Figure 8.3 demonstrates that conceptually the merge stages are generated along 
a line; again we may employ a functional decomposition and map the problem to 
a ring. The approach in this case is slightly more complex since it is desirable to 
compute the initial pairings concurrently with the first merge stage. In outline, 
the program structure is: 


mergesort(...) :— 


pairs(...), 

msort(. . . }@fwd. 
msort(...) :— 

merge-_stage(...), 

msort(.. . )}@fwd. 
msort(...). 


There are two direction annotations in this outline: The first, in the mergesort 
definition, causes the first merge stage to be spawned at the second node in the 
ring. This mapping allows generation of the initial pairings and the spawning of 
the merge stages to proceed concurrently. The second annotation, in the msort 
process definition, ensures that each merge stage is mapped to a different node in 
the ring. 

Program 8.3 shows a complete Strand implementation of the merge sort al- 
gorithm. An initial process is spawned at the first node to compute the initial 
pairings (R2). Merge stages are spawned in successive ring nodes (R2,7). Each 
merge_stage merges successive pairs in a list of lists (R10—-12). Two ordered lists 
are merged into a single ordered list by the mlists process (R13-16). When only 
one list remains to be merged, the program terminates (R8). 


8.2. Torus Mappings 


-machine(ring). 
mergesort(L,L2) :— pair(L,L1), msort(L1,L2)@fwd. 


pair([X1,X2|Xs],Ys) :— 

X1 =< X2 | Ys := [[X1,X2]]Ys1], pair(Xs,Y$1). 
pair({X1,X2|Xs], Ys) :— 

X1 > X2 | Ys := [[X2,X1]|Ys1], pair(Xs,Ys1). 
pair({X],Ys) :— Ys := [[X]]. 
pair({],¥s) :— Ys := []. 


msort([X1,X2|Xs], Ys) :— 

merge_stage([X1,X2|Xs],Zs), msort(Zs, Ys)@fwd. 
msort([Xs],Xs1) :— Xs1 := Xs. 
msort([],Xs1) :— Xs1 := []. 


merge._stage([Xs1,Xs2|XXs], Ys) :— 
Ys := [Ys1|YYs], 
mlists(Xs1,Xs2,Ys1), 
merge_stage(XXs,YYs). 

merge-_stage([Xs], Ys) :— Ys := [Xs]. 

merge-stage([], Ys) :— Ys := []. 


mlists([X|Xs],[Y|Ys],O) :— 

X =< Y |O := [X|Zs], mlists(Xs,[Y|Ys],Zs). 
mlists([X|Xs],[Y|Ys],O) :— 

X >Y |O := [Y|Zs], mlists([X|Xs],Ys,Zs). 
mlists([],Ys,O) :— O := Ys. 
mlists(Xs,[],O) :— O := Xs. 


Program 8.3: Ring Merge Sort 


% R1 
% R2 
% R3 
% R4 


% R5 
% R6 


% R7 


% R8 
% R9 


% R10 


% R11 
% R12 


% R13 
% R14 


% R15 
% R16 


193 


This program incrementally feeds information through merge stages. As soon 
as data becomes available from a previous stage, the current merge stage can begin 
computing. Thus, many merge stages may compute concurrently. Although an 
interesting mapping, this program is unlikely to provide substantial speedup over 


uniprocessor performance. Why? 


8.2 Torus Mappings 


A torus virtual machine is a mesh of nodes with end-around connections: Fig- 
ure 8.4 shows a 3 x 3 torus. The end-around connections in this structure ensure 
that processes spawned at the edge of a torus utilize nodes in a completely dif- 


194 Chapter 8. Process Mapping 


ferent area of the virtual machine. Just as the ring virtual machine provided an 
infinite-line abstraction, the torus provides an infinite-mesh abstraction. Any pro- 
gram having a mesh-like structure can be conveniently mapped to this new virtual 
machine. The torus can be programmed using a direction annotation that has four 
possible directions: north, east, south and west. 





Figure 8.4: A Torus Virtual Machine 


To illustrate this infinite surface, let us consider an evil dog problem in which 
N dogs eat N cats. For simplicity, the N dogs are placed in a line eastward from 
an initial node. Similarly, the N cats are placed in a line northward from the same 
node. The dogs begin walking north trying to catch a cat. The cats begin walking 
east in an effort to avoid the dogs. When a cat reaches its favorite spot, it sits 
and meows. When the dog reaches the same spot and hears a meow, it eats the 
cat! In outline, the program creates the dogs by spawning a dog at each node in 
an eastward direction: 


dogs(...) :— 
dog(...), 
dogs(. . . )@east. 
dogs(...). 


Each dog then walks northward toward the cat: 


dog(...) :— 
dog(.. . )@north. 
dog(...). 


A similar process structure is used to create the cats. Program 8.4 shows the 
complete program and states that it executes on a torus (R1). The dogs and cats 


8.2. Torus Mappings 195 


begin walking concurrently (R2). The dogs are spawned eastward and are given 
an argument (N) that specifies where to expect the cat. Each dog also carries a 
variable (Cat) that can be used to hear a cat meow (R3). The cats are spawned 
northward and are given an argument (N) that specifies their favorite spot (R7). 
A dog proceeds northward for N steps while a cat proceeds eastward for N steps 
(R5,R9). A cat that reaches its spot meows (R10). A dog that reaches the same 
spot and hears the meow, eats the cat and barks with pleasure (R6). 


-machine(torus). % R1 
evil(N) :— dogs(0,N,Cats), cats(0,N,Cats). % R2 
dogs(N,M,[Cat|Cs]) :— % R3 


N<M |N1isN + 1, 
dog(N,N,Cat), dogs(N1,M,Cs)@east. 


dogs(N,N,_). % RA 

dog(N,M,Cat) :— % R5 
M > 0 |M1 is M — 1, dog(N,M1,Cat)@north. 

dog(N,0,meow) :— display({N,“Arf Arf!’}). % R6 

cats(N,M,Cs) :— % R7 
N<M | 


Cs := [C|Cs1], N1 is N + 1, 
cat(N,C), cats(N1,M,Cs1)@north. 


cats(N,N,_). % R8 

cat(M,Cat) :— % R9 
M>0 |M1 is M — 1, cat(M1,Cat)@east. 

cat(0,Cat) :— display(“Meow!”), Cat := meow. % R10 


Program 8.4: The Evil Dogs Program 


In summary, process mapping on a torus can be achieved in a similar fashion 
to a ring; however, this structure permits a greater variety of mappings. 


8.2.1 Matrix Multiply 


Section 7.1 presented a matrix multiplication program that takes two matrices A 
and B`! and computes C = A x B. A careful scrutiny of the program structure 
shows that a mesh of processes is created corresponding to points in the matrix 
C. Thus, the computation maps conveniently to a torus. In outline, the necessary 
direction annotations are: 


196 Chapter 8. Process Mapping 


mm(...) :— 
row(...), 
mm(...)@east. 

mm(...). 

row(...) :— 


inner_prod(...), 
row(...)@north. 
row(...). 


This mapping scheme spawns a set of row processes in the east direction. Each of 
these processes then generates a set of inner product processes (inner_prod) that 
are spawned in the north direction. Thus, each element of the result matrix C is 
computed at a different node by a unique inner product. Program 8.5 gives the 
complete program with the appropriate direction annotations. 


-machine(torus). 


mm([A|As],Bs,Cs) :— 
row(A,Bs,Rs), 
list_to_vector(Rs,C), 
Cs := [C|Cs1], 
mm(As,Bs,Cs1)@east. 

mm([],-,Cs) :— Cs := []. 


row(A,[B|Bs],Rs) :— 
inner_prod(A,B,R), 
Rs := [R|Rs1], 
row(A,Bs,Rs1)@north. 
row(_,[],Rs) :— Rs := []. 


Program 8.5: Matrix Multiplication on a Torus 


8.3 Comparing Regular Mappings 


The previous sections have introduced two simple process mappings for regular 
problems. It is now valuable to show a single problem solved using both mappings. 
The problem we consider is that of integrating the function X? + 2 over the closed 
interval [a,b]. Figure 8.3 shows a simple method to solve this problem. The area 
under the curve is divided into a set of equi-spaced strips of width W. The integral 
is then approximated by summing an approximation of the area of each strip. The 


8.3. Comparing Regular Mappings 197 


area between two points X and Y can be approximated independently of the other 
strips using the formula: 


Area = Wx(X? + Y? + 4)/2 







Figure 8.3: Approximating the Integral of X? + 2. 


Ring Mapping. Since the area of each strip can be computed independently 
it is possible to execute the calculations at independent nodes. Since the problem 
is decomposed along the X axis, the structure of the decomposition is, in essence, 
a line. The line of calculations can easily be mapped to a ring virtual machine. In 
outline, the mapping has the following form: 


integrate(...) :— 
strip_area(...), 
integrate(. . . )}@fwd. 

integrate(...). 


The structure of this outline is similar to that of the N Barking Dogs program. The 
@fwd annotation in the first integrate rule carries the process structure along a line 
in the virtual machine. At each step, a strip_area process is spawned to compute the 
area of a single strip. The second integrate rule signifies that spawning terminates 
at the end of the interval [a,b]. Refining this outline yields Program 8.6 which 
solves the integration problem. 

The area of each strip is added to an accumulator (Ac) propagated through the 
process structure; this initially has value zero (R1). A variable (Area) representing 
the result is also propagated. Each strip_-area process computes the area of a 
single strip (SArea) with width W (R4). Spawning terminates when all strip_area 
processes have been generated (R3). The value of the accumulator is eventually 
returned as the result of the computation (Area); thus the result is accessible in 
both the initial and final nodes of the virtual machine (R3). 


198 Chapter 8. Process Mapping 


-machine(ring). 


integrate(A,B,W,Area) :— % R1 
integrate 1(A,B,W,0,Area). 
integrate1(A,B,W,Ac,Area) :— % R2 
A=<B | 
M is A + W, 


strip-area(A,M,W,SArea), 
Aci is Ac + SArea, 
integrate 1(M,B,W,Ac1,Area)@fwd. 
integrate1(A,B,_,Ac,Area) :— % R3 
A > B |Area := Ac. 


strip-area(X,Y,W,SArea) :— % R4 
SArea is W*((X*X)+(Y*Y)+4)/2. 


Program 8.6: Integration on a Ring 


Observe that the computation performed at each node is a single strip area 
calculation: 


SArea is W*((X*X)+(Y*Y) + 4)/2 


This granularity is so small that the program is unlikely to benefit from parallel 
execution. However, granularity can be increased by a slight modification to the 
strip-area process. Each process subdivides its segment of the interval into many 
strips to be calculated at a single node. This modification is left as an exercise. 

Torus Mapping. The ring mapping for this problem is particularly simple. 
However, it is instructive to design a torus mapping for the benefit of comparison. 
In outline, this process mapping is as follows: 


integrate(...) :— 
strip-set(...), 
integrate(. . . )@east. 

integrate(...). 


strip-set(...) :— 
strip-area(...), 
strip-set(. . . \@north. 

strip-set(...). 


This mapping structure is identical to that used in the matrix multiplication pro- 
gram designed in Section 8.2. The essential idea is to group sets of strips together 


8.4. An Irregular Mapping 199 


in an easterly direction. Each group is then computed in a northerly direction. 
The details of the refinement are left as an exercise. 

Notice that in the ring mapping for this problem the calculation of the Nth 
strip is begun only after O(N) process reductions. In the torus mapping, the 
last strip computation begins after only O(N) reductions. In addition, the ring 
mapping must compute N sum calculations in sequence as each depends on the 
result of the previous sum. Contrast this with the torus mapping which permits 
column calculations to be summed independently. 


8.4 An Irregular Mapping 


Each of the problems we have considered in previous sections has had a regular 
structure that could be utilized in mapping the problem to a virtual machine. In 
this section, we consider an irregular problem called Triangle. This problem is a 
simple search problem involving a triangular board of the form: 


7 8 9 10 
11 12 13 14 15 


In its initial state, the board has a peg in all positions except position 5. Pegs 
may be removed by jumping (as in checkers). The first move is given and consists 
of the peg in position 12 jumping that in position 8. The puzzle is solved when 
only a single peg remains after thirteen successful moves. A solution consists of 
the sequence of moves made in solving the puzzle. All 775 solutions are to be 
generated. 

The initial state and the set of possible moves define a search tree. Each vertex 
in the tree corresponds to a legal board; each edge corresponds to a legal move. 
However, this tree has an irregular shape because only a subset of the possible 
moves can be made at a given vertex. The subset is determined by the moves 
made to reach that vertex in the tree. 

Program 8.7 presents a solution to the problem that enumerates all possible 
paths in the problem concurrently. A board is represented as a tuple of integers. 
The integer 1 at a position in the tuple indicates the presence of a peg in the 
board; a 0 indicates an empty position. The program begins execution by making 
an initial move (R1). This move carries details of the pegs involved (12=1,8=1,5=0), 
the initial board (B), the number of moves remaining (13) and the legal moves in 
the puzzle (Mvs). An accumulator to hold the solution to the problem is initially 
represented by an empty list ([]). 

The move process makes a single move on a board. If the move is not legitimate, 
then the move process terminates indicating that this area of the search does not 
lead to a solution (R3). A move is legitimate if the current board contains a peg 


200 Chapter 8. Process Mapping 


triangle :— % R1 
get_moves(Mvs), get_board(B), move(12,8,5,1,1,0,B,13,Mvs,[]). 


move(F,O,T,1,1,0,B,D,Ms,S) :— % R2 
ND is D — 1, make-tuple(15,NB), 
put-arg(F,NB,0), put-arg(O,NB,0), put-arg(T,NB,1), 
copy(15,F,O,T,B,NB), try-moves(ND,Ms,NB,Ms,[{F,O}|S]). 


MOVe(_,-,-,-1-1-1-1-»-»-) :— Otherwise | true. % R3 

try_moves(D,Mvs,B,AllMvs,S) :— % R4 
D >0 | make_moves(Mvs,B,D,AllMvs,S). 

try_moves(0,-_,.,,S) :— display(S). % R5 

make_moves([{F,O,T}|Mvs],B,D,Ms,S) :— % R6 


get_arg(F,B,X0), get-arg(O,B,X1), get_arg(T,B,X2), 
move(F,O,1T,X0,X1,X2,B,D,Ms,S), make_moves(Mvs,B,D,Ms,S). 


make_moves([],-,-,-,-). % R7 
copy(N,F,O,T,B,NB) :— % R8 
N>0 |N1 is N — 1, copy1(N,F,O,T,B,NB), copy(N1,F,0,T,B,NB). 
copy(0,_,-,-,-,-). % RI 
copy1(N,F,0,T,B,NB) :— % R10 
N =\=F, N=\=0, N=\=T | get-arg(N,B,A), put_arg(N,NB,A). 
copy1(-,.,-,-,-,-) :— otherwise | true. % R11 


get_moves(X) :— X := [{1,2,4},{2,4,7},{4,7,11},{3,5,8}, {5,8,12}, 
{6,9,13},{1,3,6},{3,6,10},{6,10,15},{2,5,9},{5,9,14}, 
{4,8,13},{11 12,13}, {12,13,14},{13,14,15},{7,8,9}, {89,10}, 
{4,5,6},{4,2,1 },{7,4,2},{11 :7,4},{8,5,3},{12,8,5},{13,9,6}, 
{6,3,1 },{1 0,6,3},{1 5,1 0,6},{9,5,2},{14,9,5}, {1 3,8,4}, 
{13,12,11 },{14,13,12},{15,14,13},{9,8,7},{10,9,8},{6,5,4}]. 


get_-board(B) :— B := {1,1,1,1,0,1,1,1,1,1,1,1,1,1,1}. 


Program 8.7: The Triangle Program 


8.4. An Irregular Mapping 201 


(1) in the from position (F), a peg in the over position (O) and a hole (0) in the 
to position T (R2). In this case, the move is made by copying the board with 
new entries in these three positions (R2,R8-11); the solution so far is updated to 
include the move made ([{{F,O}|S]). Finally, a try_moves process is invoked to decide 
if a solution has been reached. If the puzzle has been solved, then the solution is 
displayed (R5); otherwise, an attempt is made to make each possible legal move 
on the new board (make_moves), thus further expanding the search space (R4). 

The solution as described has two primary problems: It generates too many 
processes and its irregular structure does not map to any particular virtual ma- 
chine. We will consider the latter of these problems first. 


8.4.1 Problem Mapping 


Since the problem has an irregular structure, we prefer to utilize a load balancing 
strategy rather than map the problem directly to a virtual machine. The scheme 
we will use is called the manager-worker strategy. The essence of this tech- 
nique is to utilize two generic processes: a manager and a worker. The manager 
is responsible for partitioning a problem into subproblems and allocating these to 
workers. The workers are responsible for solving a single subproblem and request- 
ing additional work from the manager when they become idle. 

To apply this solution strategy to the Triangle problem we divide up the search 
space into portions as shown in Figure 8.6. The search space is decomposed into a 
portion corresponding to the top levels of the space and a separate portion for each 
subtree. The manager-worker scheme can now be applied easily. The manager will 
be responsible for generating the top levels of the space and the workers for solving 
subtrees independently. 

Since the manager must be able to communicate subproblems to all of the 
workers, the organization of the strategy is in essence star-shaped. The manager 
sits at the center of the star and the workers sit at its tips. Each arm of the 
star corresponds to a communication channel used by the manager to allocate 
subproblems to a particular worker. Fortunately, it is easy to map a star onto a 
ring virtual machine using the direction annotation described earlier. In outline, 
the algorithm to achieve this has the form: 


Sstar(...) :— 
center(...), 
tips(. . . }@fwd. 

tips(...) :— 
tip(...), 
tips(. . . )}@fwd. 


tips(...). 


202 Chapter 8. Process Mapping 


Top Levels of 
Search Space 






~<q— fringe 





~ \ 


subproblems 


Figure 8.6: Triangle Partitioning 


This algorithm spawns the center of the star at the first node in the ring. It 
subsequently spawns the required number of tips around the ring beginning at 
the second node. As the tips are spawned each must carry a stream to the center 
in order to form the spokes of the star. If the center process in the outline is 
now replaced by a manager process and each tip replaced by a worker, then the 
manager-worker process structure has been created. 

Program 8.8 shows the complete program. It is defined using a manager and 
a set of workers (R2). The manager process is spawned at the initial node in 
the virtual machine; workers are spawned at subsequent nodes. At each step of 
the algorithm a worker and a stream to the manager are placed in the process 
structure (R3). All streams are merged into a single stream to the manager at 
the initial node (R2). On completion of spawning, the structure that remains is 
the required star. The manager receives all requests generated by workers on a 
single input stream Requests. Figure 8.8 shows the behavior of the program while 
creating this process structure. 


8.4.2 Manager Definition 


Recall that the manager process performs two basic activities: partitioning a prob- 
lem into a number of subproblems and balancing the load by sending subproblems 
to workers when they become idle. The manager is spawned with a single stream 
containing requests for work from the workers. It immediately forks into two 


8.4. An Irregular Mapping 203 


-machine(ring). % R1 
triangle(N) :— % R2 
manager(Requests), 


merger(Streams,Requests), 
workers(N,Streams)@fwd. 


workers(N,Ss) :— % R3 
N>0 | 
N1 is N — 1, 
Ss := [merge(S)|Ss1], 
worker(S), 
workers(N1,Ss1)@fwd. 
workers(0,Ss) :— Ss := []. % RA 


Program 8.8: Spawning the Manager-Worker 


worker 4 
worker 3 
Spawning 
Path 
Streams worker 2 
worker 1 
manager 


Figure 8.8: Spawning the Manager-Worker Process Structure 


204 Chapter 8. Process Mapping 


processes. The first is responsible for partitioning and the second for balancing. 
The partitioning process generates a list of subproblems and the balancing process 
allocates these to workers in response to requests for work: 


manager(Requests) :— 
partition(SubProblems), 
balance(SubProblems, Requests). 


Partitioning. The partitioning activity for the Triangle problem is essentially 
one of generating the top levels of the search space. This can be achieved by 
modifying Program 8.7 to include a difference list of subproblems. At the fringe 
of the top levels of the search space subproblems (i.e., game boards) are added 
to this list rather than being expanded further. Recall that the original program 
began execution as follows. 


triangle :— 
get_moves(Mvs), 
get_board(B), 
move(12,8,5,1,1,0,B,13,Mvs,| ]). 


The modified program begins partitioning in a similar manner but carries a dif- 
ference list of subproblems: 


partition(Bds) :— 
get_moves(Mvs), 
get_board(B), 
move(12,8,5,1,1,0,B,13,Mvs,[],Bds,[]). 


The difference list (Bds) must be threaded through all the process definitions 
in the original program. The program rule that adds subproblems to the list 
corresponds to the rule that expanded game boards in the original program. Recall 
the following process definition from the original program: 


try_moves(D,Mvs,B,AllMvs,S) :— 
D>0O| 
make_moves(Mvs,B,D,AllMvs,S). 
try_moves(0,_,_,_,5) :— 
display(S). 


This process is responsible for detecting if a solution has been found. The first rule 
expands the search space; the second displays solutions. The process can easily 
be modified to generate the appropriate subproblems. Above the fringe the space 
is expanded as before. On reaching the fringe, the current subproblem is inserted 
into the list of subproblems for solution by workers. For example, assuming the 
top five levels of the space are to be expanded, the try_moves process is redefined 
as: 


8.4. An Irregular Mapping 205 


try_moves(D,Mvs,B,AllMvs,S,Bdb,Bde) :— 
D>8 | 
make_moves(Mvs,B,D,AllMvs,S,Bdb,Bde). 
try_moves(8,_,B,_,S,Bdb,Bde) :— 
Bdb := [{B,8,S}|Bde]. 


Balancing. The problem of balancing is to keep the workers busy. We will 
assume that when a worker becomes idle it requests additional work from the 
manager. The task of balancing is thus one of granting requests for work. To 
achieve this functionality, we define a balancing process that receives a list of 
subproblems and a stream of requests from workers. 


balance([Subproblem|Bds],[req(Reply)|Rs]) :— % R1 
Reply := Subproblem, 
balance(Bds,Rs). 


balance([],[req(Reply)|Rs]) :— % R2 
Reply := stop, 
balance([],Rs). 

balance(.,[]). % R3 


Each request is an incomplete message that contains a slot for a reply. The bal- 
ancing process simply matches requests to subproblems (R1). In the event that all 
subproblems have been allocated to workers, the balance process sends a stop reply 
for subsequent requests. This indicates to a worker that there is no further work to 
be carried out and that it should terminate (R2). The balance process terminates 
when all workers have terminated and closed their request streams (R3). 


8.4.3 Worker Definition 


The worker program is responsible for solving a single subproblem allocated by the 
manager. This, again, is achieved using Program 8.7: The program is modified to 
request a subproblem from the manager and then solve the subproblem as before. 


A worker should only request additional work when it is idle. Thus, it is 
necessary to detect when a worker has completed a subproblem. This termination 
detection can be achieved by threading a short-circuit through the worker code 
definitions as described in Section 3.3. 


It is valuable to be able to communicate the next subproblem for a given 
worker while the worker is progressing on its current subproblem. To achieve this 
overlapping of execution and communication, a new request for work is issued as 
soon as a subproblem is received. The code that achieves this synchronization is: 


206 Chapter 8. Process Mapping 


worker(Rs) :— % R1 
Rs := [request(Subproblem)|Rs1], 
get_moves(Mvs), 
worker(done,Subproblem,Rs1,Mvs). 


worker(done,{B,D,S},Rs,Mvs) :— % R2 
Rs := [request(Subproblem)|Rs1], 
make_moves(done,Done,Mvs,B,D,Mvs,S), 
worker(Done,Subproblem,Rs1,Mvs). 

worker(_,stop,Rs,_) :— Rs := []. % R3 


A worker requests a subproblem immediately it is invoked (R1). Each time it 
receives a subproblem, a make_moves process is created to solve the subproblem 
and another subproblem is requested (R2). If there are no further subproblems to 
solve, a Stop message arrives indicating that the worker may terminate (R3). 

Notice the short-circuit threaded through the make_moves process: This begins 
at the first argument and ends at the second (Done). The short-circuit is used to 
ensure that a worker only solves a single subproblem at a time. Even though the 
next subproblem may have arrived, the worker does not begin to solve it until the 
current suproblem is complete (Done=done). 


8.4.4 Constraining Concurrency 


As pointed out earlier, a completely unrestricted concurrent expansion of the Tri- 
angle problem search space generates many thousands of processes. In all like- 
lihood, it will not be possible to expand all search paths concurrently due to 
memory constraints. There are many ways to overcome this problem involving 
various types of search strategy. In Section 5.4 some of these alternatives were 
described in detail. 

The manager’s expansion of the first few levels of the search space is not partic- 
ularly space consuming; thus, we wish to constrain only the subtree computations 
executed by workers. To achieve this the worker code is modified to pursue a 
depth-first search rather than a full concurrent expansion. This could be achieved 
by adding a synchronization chain as explained in Section 5.4; however, it is more 
expedient to use the pre-existing short-circuit that is available for detecting worker 
termination. Only a single change is necessary; recall the make_moves process in 
the original program: 


make_moves([{F,0,T}|Mvs],B,D,Ms,S) :— 
get_arg(F,B,X0), get-arg(O,B,X1), get-arg(T,B,X2), 
move(F,O,T,X0,X1,X2,B,D,Ms,S), 
make_moves(Mvs,B,D,Ms,S). 
make_moves({],-,-,--). 


This process indiscriminately expands the search space by generating all possible 
moves from the current board (B). To constrain the search, this process is modified 


8.4. An Irregular Mapping 207 


to wait for one move to be completed before expanding the next. The short-circuit 
added to detect worker termination can be used to achieve this as follows: 


make-moves(done,Done,[F,O,T|Mvs],B,D,Ms,S) :— 
get_arg(F,B,X0), get-arg(O,B,X1), get-arg(T,B,X2), 
move(done,D1,F,O,T,X0,X1,X2,B,D,Ms,S), 
make-moves(D1,D2,Mvs,B,D,Ms,S). 

make_moves(D,D1,[],-,-,.,-) :— D1 := D. 


Notice that the recursive call is forced to wait for the string done to be available 
at the first argument. This is returned from the expansion of the previous move 
on the current board. In consequence, the next move is delayed until the previous 
move is completed. Thus, only one sequence of moves is expanded at a time. Since 
a depth-first search is to be used, it is possible to modify the program so that the 
board is not copied at every step of the algorithm. This can be achieved using a 
dictionary process to record modifications to the board. 


8.4.5 Discussion 


The final Triangle problem is composed of Programs 8.8, 8.9 and 8.10. Program 
8.8 shows the complete code for spawning the manager-worker process structure 
on a ring virtual machine. Program 8.9 shows the manager program that performs 
problem partitioning and load management. Program 8.10 shows the worker pro- 
gram in which a depth-first search is used to complete subproblems. These pro- 
grams use utilities from the original program. 

The code for spawning the manager-worker structure is generic and can be used 
in a variety of irregular problems. The strategy is an important programming tool 
and is actually provided as a system library with Strand. The Triangle program 
could have been designed to use this library; this would have entailed the definition 
of only the partition and worker processes. All of the ring mapping and balancing 
code would be provided by the library. 

We have presented the manager-worker implementation because many pro- 
gramming problems can utilize some variation of the basic technique; program- 
mers need to understand the implementation in order to modify it. One variation 
has already been discussed at length in Section 5.2. The solution to the Speedy 
Pizza problem employs a type of manager-worker strategy that accounts for data 
dependencies between subproblems. Two other important variations can be dis- 
tinguished. A hierarchical set of manager processes can be used to reduce the 
bottleneck at the manager should this be problematic. Secondly, it is possible for 
workers to feedback partially expanded subproblems to the manager to have them 
re-assigned; this strategy is explored in Chapter 11. 

The manager-worker strategy is only appropriate if the number of subproblems 
that are generated is at least an order of magnitude greater than the number 
of computers to be utilized. It involves a ramp-up period at the beginning of 
execution when not all workers are active and a ramp-down period at the end of 


208 Chapter 8. Process Mapping 


manager(Reqs) :— 
partition(SPs), 
balance(SPs,Reqs). 


partition(Bds) :— 
get_moves(Mvs), get_board(B), 
move(12,8,5,1,1,0,B,13,Mvs,|[ ],Bds,[]). 


move(F,O,T,1,1,0,B,D,Ms,S,Bdb,Bde) :— 
ND is D — 1, make_tuple(15,NB), 
put_arg(F,NB,0), put_arg(O,NB,0), put_arg(T,NB,1), 
copy(15,F,0,T,B,NB), 
try.moves(ND,Ms,NB,Ms,[{F,O}|S],Bdb,Bde). 
move(_,_,-,-,-,-,-»-»-,-,-Bdb,Bde) :— 
otherwise | Bdb := Bde. 


try_moves(D,Mvs,B,AllMvs,S,Bdb,Bde) :— 
D>8 | 
make_moves(Mvs,B,D,AllMvs,S,Bdb,Bde). 
try_moves(8,_,B,_,S,Bdb,Bde) :— 
Bdb := [{B,8,S}|Bde]. 


make_moves([{F,O,T}|Mvs],B,D,Ms,S,Bdb,Bde) :— 
get_arg(F,B,X0), get-arg(O,B,X1), get_arg(T,B,X2), 
move(F,O,T,X0,X1,X2,B,D,Ms,S,Bdb,Bdm), 
make_moves(Mvs,B,D,Ms,S,Bdm,Bde). 
make_moves({],-,-,-,-,-Bdb,Bde) :— 
Bdb := Bde. 


balance({Sub|Bds],[req(Reply)|Rs]) :— 
Reply := Sub, balance(Bds,Rs). 
balance([],[req(Reply)|Rs]) :— 
Reply := stop, balance([],Rs). 
balance(_,[]}). 


Program 8.9: The Manager Code 


8.4. An Irregular Mapping 209 


worker(Rs) :— 
Rs := [req(Sub)|Rs1], 
get_moves(Mvs), 
worker(done,Sub,Rs1,Mvs). 


worker(done,{B,D,S},Rs,Mvs) :— 
Rs := [req(Sub)|Rs1], 
make_moves(done,Done,Mvs,B,D,Mvs,S), 
worker(Done,Sub,Rs1,Mvs). 
worker(_,stop,Rs,_) :— Rs := []. 


make_moves(done,Done,[{F,O,T}|Mvs],B,D,Ms,S) :— 
get_arg(F,B,X0), get_arg(O,B,X1), get_arg(T,B,X2), 
move(done,D1,F,O, T,X0,X1,X2,B,D,Ms,S), 
make_moves(D1,Done,Mvs,B,D,Ms,S). 

make_moves(D,D1,[],-,-,,-) :— D1 := D. 


move(D1,D2,F,0,T,1,1,0,B,D,Ms,S) :— 
ND is D — 1, make_tuple(15,NB), 
put_arg(F,NB,0), put_arg(O,NB,0), put_arg(T,NB, 1), 
copy(15,F,0,T,B,NB), 
try-moves(D1,D2,ND,Ms,NB,Ms,[{F,O}|S)). 
move(D1,D2,_,_,_,-,-,-,-)-»-)-) :— 
otherwise | D2 := D1. 


try-moves(D1,D2,D,Mvs,B,AllMvs,S) :— 
D>0| 
make_moves(D1,D2,Mvs,B,D,AllMvs,S). 
try_moves(D1,D2,0,_,_,_,S) :— 
D2 := D1, display(S). 


Program 8.10: The Worker Code 


210 Chapter 8. Process Mapping 


program execution when only a subset of the workers have work left to complete. 
Thus, its efficiency depends on having a long period of fully active computation 
between the end of ramp-up and the beginning of ramp-down. The performance 
of the technique is also sensitive to the relative size of subproblems since this may 
effect the length of the ramp-down period. 


8.5 A Multilingual Mapping 


We conclude this chapter by describing the conversion of an existing Fortran ap- 
plication to a multi-lingual program that may execute on a parallel computer. In 
developing this program, we utilize the techniques developed in this chapter and 
those introduced in Chapter 7. 

The problem we consider is a typical grid problem. In general, such problems 
are characterized as follows. 


e They have a cellular space. 
e Each cell has a state that is characterized by one or more numeric values. 
e A neighborhood function exists that defines the set of neighbors of a cell. 


e Time is represented as a discrete sequence fo, t),...; there exists a transition 
function that defines the state of a cell at time t;,, in terms of the state of 
the cell and its neighbors at time t;. 


A particularly simple grid problem known as the Dirichlet problem is useful in 
a number of situations including, electromagnetic theory. The state of each cell 
in a two-dimensional space is represented by a single floating-point value. Values 
are initially given for boundary cells; interior cells initially have value 0. A cell’s 
neighbors are those for which x or y coordinates differ by at most one. As time 
progresses, the values in the cells converge to a solution to Laplace’s equation 
V2 0, with the original values of O on the boundary. 
The transition function for the Dirichlet problem is defined as follows: 


e Cells on the boundary have constant states. 


e For a non-boundary cell, the state at time t;,, is the average of the states 
of its neighbors at time t;. 


In principle, transitions should be applied repeatedly until some convergence cri- 
terion is satisfied. For simplicity, the program given here applies a fixed number 
of transitions. 


8.5. A Multilingual Mapping 211 


8.5.1 Mapping Grid Problems 


Grid problems can naturally be adapted for parallel execution by data decompo- 
sition. A grid is partitioned into subgrids; each subgrid is then located on a node 
in some virtual machine. Transitions are then performed on each subgrid concur- 
rently. We choose to decompose Dirichlet’s problem by partitioning the grid into 
sets of contiguous columns. This decomposition changes the problem structure 
into a line; as in previous examples, a line is mapped to a ring virtual machine. 
Each node in the virtual machine manages one or more columns of the grid. 

Recall that the values computed at each transition are determined by the 
neighbors of each non-boundary point. Thus, each node must communicate its left 
and right columns to neighboring nodes prior to each transition. The following 
figure shows the initial state of an 8x4 grid. 


BL G1 G2 BR 
5.1 5.3 42 3.8 24 26 1.4 0.3 
65 < 00 00 00 «+00 00 00 & 9.7 
5.3 0.0 00 0.0 0.0 0.0 0.0 8.4 
1.2 14 36 5.8 93 7.2 5.6 3.6 


The grid is partitioned into two three-column subgrids (G1,G2) plus subgrids rep- 
resenting the left and right boundaries (BL,BR). The exchange of columns that 
occurs prior to a transition is indicated by arrows in the diagram. The following 
figure shows the result of applying a single transition at a single node. It focuses 
on the data required to apply a transition to the grid G1 in the previous figure. 
The output of the transition is also illustrated: 


Left Grid Right New Grid 
Col Col 

5.1 5.3 4.2 3.8 2.4 5.3 42 3.8 
65 + 00 00 00+ 00 = 30 10 10 
5.3 0.0 0.0 0.0 0.0 17 09 1.5 
1.2 14 36 5.8 9.3 14 36 5.8 


A parallel algorithm for the Dirichlet problem can hence be defined in terms of 
a number of subgrid processes and two boundary processes. A subgrid process 
repeatedly: 


1. Receives columns from left and right neighbors. 
2. Performs a transition. 


3. Transmits the modified left and right columns to its neighbors. 


A boundary process manages a boundary column and repeatedly: 


212 Chapter 8. Process Mapping 
1. Receives a column from a neighbor (which it ignores). 


2. Transmits its column to its neighbor. 


The actions of the different processes are synchronized by step 1 in both cases: 
Processes only proceed to step 2 when they have received the necessary columns 
from their neighbors. 


8.5.2 The Interface 


Assume the existence of a Fortran routine that performs a transition on a grid 
represented by a two-dimensional array of floating-point numbers. We construct 
a multi-lingual program that uses this routine to perform transitions on subgrids. 
The Strand component of this program performs the communication and synchro- 
nization component of the parallel algorithm outlined previously. 

As a first step, we define a user-defined data type GRID that encapsulates a 
two-dimensional array of floating-point numbers; this will represent a subgrid. The 
following operations on this data type are required: 


e Create a grid. 

e Access left and right columns of a grid. 
e Perform a transition on a grid. 

e Access the contents of a grid. 


These operations are defined as: 


make-_grid(Numbers,Grid): Takes a list of columns and creates a grid. 
Each column is represented by a list of floating-point numbers. 


columns(Grid,LCol,RCol): Takes a grid and returns subgrids represent- 
ing its left and right columns. 


grid_trans(LCol,Grid,RCol,NewGrid): Takes two one-column grids, LCol 
and RCol, and a Grid, applies the transition function, and returns a 
new grid that represents the transformed grid. 


access-_grid(Grid, HCols, [Cols): Takes a grid and generates a difference 
list of columns. 


In summary, the first and last operations convert between Strand data types and 
the GRID data type. The second accesses grid columns. The third invokes the 
Fortran transition program to perform a transition. 


8.5. A Multilingual Mapping 213 


8.5.3 The Strand Program 


Let us assume that the four user-defined operations have been implemented and 
tested on simple data. It is now possible to write a Strand program to implement 
the parallel grid algorithm outlined previously. This program must create the 
processes that manage the grid and define the actions these processes perform. 

Creating the Process Network. The initial process network consists of a 
number of subgrid processes plus two boundary processes. The boundary processes 
manage the left and right columns; each subgrid process manages approximately 
the same number of internal columns. These processes are connected in a doubly- 
linked chain using shared variables, as illustrated in Figure 8.9. In outline, the 
process structure required to implement this organization on a ring virtual machine 
is: 


grid(...) :— boundary(...), subgrids(... )@fwd. 


subgrids(...) :— subgrid(...), subgrids(...)@fwd. 
subgrid(...) :— boundary(... ). 


Li Ro 
<= <= <_< <_— 
BL G1 G2 G3 Gp BR 
—P —P- —P —P 
Lo Ri 


Figure 8.9 Grid Problem Process Network 


The grid is built by spawning a boundary process at the initial node; processes that 
manage subgrids are spawned on subsequent nodes. When all subgrids have been 
spawned, the final boundary process is spawned on the next node. Program 8.11 
is the complete program to generate the process structure. For simplicity, each 
subgrid is assumed to contain the same number of columns. 

The program executes on a ring virtual machine (R1). It takes as arguments 
a list of columns (Vs), the number of columns per subgrid (K), the number of 
iterations to perform (I) and an output variable for the solution (Solin). The 
process mapping follows the outline described previously (R2-4). 

To initialize a boundary process, a single column is selected from the list of 
grid columns (R5). The user-defined operation make-_grid is used to create a one- 
column grid (Col) from this column (R5). This is passed to a boundary process 
and also sent to the neighboring subgrid process, thus initiating communication 
(R5). 

To initialize a subgrid process, K columns are selected from the list of grid 
columns (R6-8); these are used to construct an initial subgrid. The user operation 


214 Chapter 8. Process Mapping 


-machine(ring). % R1 


grid(Vs,K,I,Soln) :— % R2 
boundary -init(Vs,Vs1,Li,Lo), 
subgrids(K,Vs1,-_,l,Soln,[],Lo,Li)@fwd. 


subgrids(K,[A,B|Vs1],Vs3,I,Sb,Se,Li,Lo) :— % R3 
subgrid_init(K,[A,B|Vs1],Vs2,1,Sb,Sm,Li,Lo,Mo,Mi), 
subgrids(K,Vs2,Vs3,1,Sm,Se,Mi,Mo)@fwd. 

subgrids(_,[Vs],_,-,Sb,Se,Li,Lo) :— % RA 
Sb := Se, boundary-init([Vs],_,Li,Lo). 


boundary-init([Column|Vs],Vs1,In,Out) :— % R5 
Vs1 := Vs, Out := [Col|Out1], 
make_grid([Column],Col), boundary(Col,In,Out1). 


subgrid_init(K,Vs,Vs1,1,Sb,Se,Li,Lo,Ri,Ro) :— % R6 
select(K,Vs,Vs1,Cols), 
make_grid(Cols,Grid), columns(Grid,_LCol, RCol), 
Lo := [LCol|Lo1], Ro := [RCol|Ro1], 
subgrid(Grid,!,Sb,Se,Li,Lo1,Ri,Ro1). 


select(K,[Col|Vs],Vs1,Cols) :— % R7 
K > 0 | Cols := [Col|Cols1], K1 is K — 1, 
select(K1,Vs,Vs1,Cols1). 
select(0,Vs,Vs1,Cols) :— Vs1 := Vs, Cols := []. % R8 


Program 8.11: Creating Grid Processes 


columns is then applied to extract the initial grid’s left and right columns; these are 
passed to neighboring subgrid processes. Each subgrid process is given a portion 
of a difference list representing the final grid (Sb,Se). 


Process Actions. The subgrid and boundary processes must perform a number 
of iterations of a receive-compute-transmit cycle. At each iteration, they wait 
to receive columns from neighboring processes, apply a transition and pass new 
columns to their neighbors. After | iterations, the columns in the subgrids are 
collected to form the solution. Program 8.12 defines the two processes. 


At each iteration, a subgrid process awaits left and right columns from its 
neighbors (R1). It then invokes the grid_trans operation to perform a transition; the 
left and right columns of the new grid are communicated to neighboring processes 
(R1). After | iterations, the subgrid process uses the access_grid operation to 
construct its component of the solution (R2). 


8.5. A Multilingual Mapping 215 


subgrid(Grid,I,Sb,Se,[LCol|Li],Lo,[RCol|Ri],Ro) :— % R1 
I>0 |l1isl— 1, 
grid-trans(LCol,Grid,RCol,NewGrid), 
columns(NewGrid,NewLCol,NewRCol), 
Lo := [NewLCol|Lo1], Ro := [NewRCol|Ro1], 
subgrid(NewGrid,|1,Sb,Se,Li,Lo1,Ri,Ro1). 
subgrid(Grid,0,Sb,Se,_,Lo,_,Ro) :— % R2 
Lo :=[], Ro := [], access_grid(Grid,Sb,Se). 


boundary(Col,[_|In],Out) :— % R3 
Out := [Col|Out1], boundary(Col,In,Out1). 
boundary(_,[],-). % R4 


Program 8.12: Subgrid and Boundary Processes 


At each iteration, a boundary process waits to receive a column from its neigh- 
bor before replying with its one-column grid (R3). The column received from the 
neighbor is ignored; it serves simply to synchronize the generation of columns with 
the computation being performed by subgrid processes. 

In summary, all processes alternately receive, compute and transmit. No pro- 
cess can perform a transition until it has received columns from its neighbors. 
As these represent the result of the previous transition, data-flow synchronization 
ensures the correct sequencing of computation. 


8.5.4 Improving Memory Performance 


The grid_trans operation defined earlier creates a new copy of the grid; the old 
grid is simply discarded. In principle, a Strand implementation could detect that 
the memory occupied by the old grid is available for reuse; however, in general 
a garbage collector must be run to reclaim this memory. This can be expensive, 
particularly if grids are large. 

One way to reduce memory turnover is to allocate two grids initially and to 
make the grid program alternate between them. At each iteration, the program 
reads one grid and writes the other; after each transition, the two grids exchange 
roles. Grids hence become mutable structures. This requires that the grid_trans 
operation be redefined to take two grids as arguments. One is the grid to be 
transformed, and the other is a grid to be used to hold the result. 


grid_trans(LCol,G1,RCol,G2,NG1,NG2): Takes two one column grids, LCol 
and RCol, a grid to be transformed, G1, and a grid in which the output 
is to be placed, G2. Applies the transition function to G1; returns G1 

and G2 as NG2 and NG1, respectively. 


216 Chapter 8. Process Mapping 


A number of changes must be made to Programs 8.11 and 8.12 to utilize this 
operation. The process that initializes a subgrid must create two copies. Note 
the two invocations of the make_grid operation and the additional argument to the 
subgrid process (Grid2): 


subgrid_init(K,Vs1,Vs2,1,Sb,Se,Li,Lo,Ri,Ro) :— 
select(K,Vs1,Vs2,Cols), 
make_grid(Cols,Grid1), make_grid(Cols,Grid2), 
columns(Grid1,LCol,RCol), 
Lo := [LCol|Lo1], Ro := [RCol|/Ro1], 
subgrid(Grid1,Grid2,1,Sb,Se,Li,Lo1,Ri,Ro1). 


The subgrid process must alternate between the use of the two grids: 


subgrid(Grid1 ,Grid2,!,Sb,Se,[LCol|Li],Lo,[RCol|Ri],Ro) :— 
I>0O |IlMisl—1, 
grid_trans(LCol,Grid1,RCol,Grid2, NewGrid1 ,NewGrid2), 
columns(NewGrid1,NewLCol,NewRCol), 
Lo := [NewLCol|Lo1], Ro := [NewRCol|Ro1], 
subgrid(NewGrid1,NewGrid2,I1,Sb,Se,Li,Lo1,Ri,Ro1). 
subgrid(Grid1,_,0,Sb,Se,_,Lo,_,Ro) :— 
Lo :=[], Ro :=[], access_grid(Grid1,Sb,Se). 


At each iteration, the grid written in the previous iteration but one (Grid2) is 
reused. Data-flow synchronization ensures that reuse does not lead to race con- 
ditions; a subgrid process does not reuse a grid before its left and right columns 
have been extracted, sent to neighboring processes and acknowledged. 


8.5.5 Discussion 


The grid problem examined in this section is a typical numeric computation. The 
multi-lingual program that we have presented implements a parallel grid algorithm 
based on data decomposition. 

The program required one user-defined data type, GRID, and four user-defined 
operations. Two approaches to the design of the user operations were presented. 
The first created a new grid at each iteration. This approach obeyed all six of 
the design principles presented in Section 7.3. The second approach sought to 
reduce memory turnover by creating two grids initially and reusing these grids 
at each iteration. This violated the single-assignment rule. However, data-flow 
synchronization ensured that operations on grids were strictly sequenced; the use 
of mutable structures did not change program behavior. 


8.6 Summary 


This chapter has demonstrated the use of Strand’s process mapping tools. These 
are based on the concept of a virtual machine that is a collection of connected 


8.6. Summary 217 


computing sites called nodes. Virtual machines are easier to program than the 
raw hardware and allow us to write portable, scalable concurrent programs. 


Three types of mapping have been demonstrated. Ring mappings used simple 


direction annotations of fwd and bwd. Torus mappings used direction annotations 
of north, east, south and west. Load-balancing strategies allocate work dynamically 
in response to changing demand. 


Exercises 


8.1 


8.2 


8.3 


8.4 


8.5 


8.6 


8.7 


8.8 


Modify the integration program, designed for a ring, to calculate M strips at 
a single node. Study the performance of this program on a parallel computer 
for increasing M. 


Complete the refinement of the torus integration program. Compare its 
performance to that of the ring program. 


The integration problem can be tackled using a divide-and-conquer strategy. 
To integrate an interval, the interval is divided in half, each subinterval is 
calculated and the two are summed. This leads to a recursive specification 
whose structure is a tree. Write this program and map it to a binary-tree 
structured virtual machine that has annotations left and right. Form an 
alternative program that maps the program to a torus virtual machine. 


The merge sort program is unlikely to produce substantial speedups because 
it only spawns O(logN) merge stages. Give an alternative program that 
computes each merge pair in parallel. 


Design a concurrent program which generates all possible pairings of elements 
from two lists L1 and L2. This program should execute on a ring virtual 
machine. 


Implement the processes required to solve the grid problem presented in 
Section 8.5. Study its behavior on varying-sized grids. How big do subgrids 
need to be before reasonable speedups are obtained? 


Develop an alternative solution to the grid problem that uses a torus virtual 
machine. You will need to decompose the data along both dimensions. 


Develop a three-dimensional solution to the grid problem suitable for execu- 
tion on a cube virtual machine. This supports direction annotations: fwd, 
back, north, east, south, west. 





A < ms 





TSS 





Chapter 9 


Metaprogramming 


Metaprogramming is an elaborate term for a simple activity: the writing of pro- 
grams that take other programs as data. Strand’s simple, recursively-defined data 
structures make it easy to both represent programs as data and write programs 
that manipulate them. 

Metaprograms include program analyzers, transformers and interpreters. This 
chapter illustrates each of these applications. Initially, we develop a simple pro- 
gram analyzer derived from a program presented in Section 2.10. We then present 
a simple Strand interpreter, written in Strand, and show how enhancements to this 
program can allow it to trace program execution. Finally, we show how language 
extensions can be incorporated into programs by source-to-source transformations. 


Metaprogramming Applications: 


e Program analysis. 
e Source-to-source transformations. 
e Interpretation. 





9.1 Program Analysis 


The key to using Strand for metaprogramming is the equivalence of program and 
data. Strand programs have a natural representation as Strand terms. A process 
definition is represented by a list of terms, each of which corresponds to a single 
rule with ‘:—’, ‘|’ and ‘,’ viewed as infix operators. Thus, the rule H :—G |B 
is represented by the term ’ :— ’(H,’|’(G,B)). In addition, a variable V in the original 


219 


220 Chapter 9. Metaprogramming 


program is represented as a tuple of the form ’_var’(’V’). This representation is 
somewhat deficient as it prevents the use of tuples of this form in programs. 
However, it is simple and sufficient for most practical purposes. 

Figure 9.1 shows a program rule and a Strand representation of this rule. Ob- 
serve that as ‘:—’ and ‘|’ are infix operators, both structures are Strand terms. The 
only difference between the two terms is their differing representation of variables. 


member(E,[E1|Es],R) :— E =\= E1 | member(E,Es,R). 


9 — ( 
member(’_var’(’E’),[’_var’(’E1’)|’_var’(’Es’)],’_var’(’R’)), 
aK 

' =\= '(_var’('E’), ’-var’(’E1’)), 
member(’_var’(’E’),’_var’(’Es’),’_var’(’R’)) 


Figure 9.1: Program = Data 


From these discussions it is evident that a program is just a tree structure. This 
implies that the techniques used to inspect Strand terms can also be applied to 
programs. To illustrate this point we adapt the tree analysis program introduced 
in Section 2.10 that solved the following problem: 


“Given a ground term X (i.e. one that contains no variables), deter- 
mine how many integers and strings are contained in X.” 


We now pose the following related problem: 


Problem 9.1: “Given a program X determine how many integers and 
strings are contained in X.” 


Program 9.1 solves this new problem with only minor changes to the original 
program. These changes are indicated by comments in the program text: Two 
rules are added and one is modified. Recall that the original program recursively 
inspected the input term, counting each integer and string encountered. The 
first modification specifies that if the input term represents a variable, counting 
terminates (R1). If the input term is a tuple of arity two that does not represent 
a variable, its arguments are scanned (R2); scanning is also performed if the input 
term is a tuple with an arity other than two (R3). 

Program 9.1 is a simple example of a program that analyzes a program. Clearly 
the same techniques can be applied to a wide range of analysis problems such as 
those associated with optimizing compilers and debuggers. 


9.2. Interpreters 221 


scan(T,Ilcnt,Scnt) :— scan1(T,0,lcnt,0,Scnt). 


scan1(T,li,lo,Si,So) :— 
integer(T) | lois li+ 1, So := Si. 
scan1(T,li,lo,Si,So) :— 
string(T) | So is Si + 1, lo := li. 
scan1(T,li,lo,Si,So) :— 
real(T) | lo := li, So := Si. 
scan1([],li,lo,Si,So) :— 
lo := li, So := Si. 
scan ({[Head|Resti],li,lo,Si,So) :— 
scant (Head, li,I1,Si,S1), scan (Rest,11,10,S1,So). 


scan1({A,-},li,lo,Si,So) :— % R1: New rule 
A ==’ var’ | lo := li, So := Si. 
scan1({A,B},li,lo,Si,So) :— % R2: New rule 
A =\='-var’ | scan_args({A,B},2,li,lo,Si,So). 
scan1(T,li,lo,Si,So) :— % R3: Modified rule 


tuple(T), T =\= {-,-} | 
length(T,A), scan_args(T,A,li,lo,Si,So). 


scan_args(Tuple,On,li,lo,Si,So) :— 
On > 0 | 
get_arg(On, Tuple,Arg), On1 is On — 1, 
scant (Arg,li,I1,Si,S1), 
scan_args(Tuple,On1,!1,10,S1,So). 
scan_args(-,0,li,lo,Si,So0) :— lo := li, So := Si. 


Program 9.1: Solution to Problem 9.1 


9.2 Interpreters 


An interpreter is a program that symbolically executes and hence simulates the 
behavior of another program. Interpretation is considerably less efficient than 
direct execution of a compiled program. Hence, interpreters are not particularly 
valuable from a practical perspective but are sometimes useful for debugging. 

We will illustrate the principles of interpretation using a simple language in- 
volving arithmetic expressions. This language provides four infix operators: +, —, 
* and /, and supports real and integer numbers. Expressions in the language can 
be represented as Strand data structures. For example: 


2 x (5 — 8/4) 


Program 9.2 implements an interpreter for this language; each rule in this program 
describes how to simplify an expression. The interpreter relies on the underlying 


222 Chapter 9. Metaprogramming 


-exports([sim/2]). 


sim(A+B,O) :— sim(A,O1), sim(B,O2), O is 01+02. 
sim(A—B,O) :— sim(A,O1), sim(B,O2), O is 01—02. 
sim(A*B,O) :— sim(A,O1), sim(B,O2), O is O1*O2. 
sim(A/B,O) :— sim(A,O1), sim(B,O2), O is 01/02. 
sim(1,O) :—integer(l) | O := 1. 

sim(R,O) :— real(R) |O := R. 


Program 9.2 A Simple Interpreter 


Strand implementation to ensure that expressions are evaluated in the correct 
order. To evaluate the above expression, the interpreter is executed using the 
following initial process: 


sim(2 * (5 — 8/4),R). 


Execution of the process simulates evaluation of the expression 2 * (5 — 8/4) 
and computes the result: R = 6. Table 9.1 shows the shortest possible execution 
sequence to compute the result. 

Note that it is easy to perform the example computation more efficiently by 
defining, compiling and executing the following program: 


comp(R) :— R is 2 x (5 — 8/4). 


Execution of this alternative program involves only three simple arithmetic op- 
erations; interpretation of the expression 2 * (5 — 8/4) involves many process 
reductions. This comparison emphasizes the inefficiencies associated with inter- 
pretation. 


9.2.1 A Strand Interpreter 


We now present a simple interpreter for the Strand language and show how it 
can be extended to trace program execution. Recall that a Strand interpreter was 
presented in Section 2.8 using a Pascal-like notation. This program maintained 
a process pool as a data structure and explicitly manipulated process arguments 
when attempting to reduce processes. Process definitions were represented as 
data. This program is interesting because of the insights it provides into Strand’s 
operational model. However, it is quite complex and if coded in Strand would not 
be very efficient. 

Fortunately, it is possible to write a more concise Strand interpreter in Strand 
by delegating uninteresting aspects of program execution to the underlying imple- 
mentation. For example, we frequently wish to observe or control process actions 


9.2. Interpreters 


“Step Pick Result Process Poo $= 
1 - - sim(2*(5—8/4),R) 
2 1 change state sim(2,01), sim(5—8/4,O02), 
+ fork R is O1*02 
3 1 terminate sim(5—8/4,02), R is 2x02 
4 1 change state — sim(5,03), sim(8/4,04), R is 2x02 
+ fork O2 is 03—04 
5 1 terminate sim(8/4,04), R is 2x02, O2 is 5—04 
6 1 change state sim(8,05), sim(4,06), R is 2x02, 
+ fork O4 is 05/06, O2 is 5-04 
7,8 1,2 terminate both R is 2x02, O4 is 8/4, O2 is 5—04 
9 2 terminate R is 2x02, O2 is 5—2 
10 2 terminate R is 2*3 
11 1 terminate empty and R = 6 


Table 9.1: Interpreting Arithmetic Expressions 


223 


such as process creation and termination, but are not generally concerned with the 
details of head matching, guard execution, process suspension and error reporting. 
For convenience, each process definition that is to be interpreted is transformed 
into an alternative form. This form, when used to execute a process, returns a term 
corresponding to the body of a rule that can be used to reduce the process. To 
illustrate the transformation, we show both the original and transformed versions 
of the member process definition. Recall the original definition from Section 2.6: 


member(E,[E|-],R) :— R := true. 
member(E,[E1|Es],R) :— E =\= E1 | member(E,Es,R). 
member(_,[],R) :— R := false. 


The transformed definition, expressed in terms of a rule process, is: 


rule(member(E,[E|-],R), B) :— B := (R := true). 
rule(member(E,[E1|Es],R), B) :— E =\= E1 |B := member(E,Es,R). 
rule(member(_,[],R), B) :— B := (R := false). 


224 Chapter 9. Metaprogramming 


Each rule in the original process definition is represented by a new rule in the 
transformed definition. The head of the original rule is given as the first argument 
to the new rule. The guard of the original rule (if any) is retained as the guard of 
the new rule. A term representing the body of the original rule is returned as the 
second argument. 

In essence, the rule process directly executes, not interprets, the matching and 
guard evaluation required to determine whether a rule can be used to reduce a 
process. Consider the execution of the following processes: 


rule(member(2,[2,7],R),B) assigns R := true to B 
rule(member(2,[1,2],R),B) assigns member(2,[2],R) to B 
rule(member(2,[3,4],R),B) assigns R := false to B 


Predefined processes can also be expressed in this form and executed directly, for 
example: 


rule(X := Y,B) :— X := Y, B := true. 
rule(X is Y+Z,B) :— X is Y+Z, B := true. 


A process rule(X := meow, B) hence assigns the value meow to X. 

It is now possible to use the rule form to define a variety of Strand interpreters. 
Program 9.3 shows an interpreter reduce, written in Strand, that concerns itself 
only with process actions. The process terminates when applied to the string 
true (R1). It creates a new instance of the interpreter for each component in a 
conjunction (R2). Otherwise, it creates a rule process to determine the body of a 
rule that can be used to reduce the process; then it simulates the execution of this 
body (R3). 


-exports(({reduce/1]}). 

reduce(true). % R1 
reduce((P1,P2)) :— reduce(P1), reduce(P2). % R2 
reduce(Process) :— % R3 


Process =\= true, Process =\= (.,-) | 
rule(Process,Body), reduce(Body). 


rule(member(E,[E|-],R),B) :— B := (R := true). 
rule(member(E,[E1|Es],R),B) :— E =\= E1 |B := member(E,Es,R). 
rule(member(_,[],R),B) :— B := (R := false). 

rule(X := Y,B) :— X := Y, B := true. 


Program 9.3: A Strand Interpreter 


Execution of Program 9.3 creates a pool of reduce processes, each of which sim- 
ulates the execution of a process in the original program. The program does not 


9.2. Interpreters 225 


concern itself with how these processes are scheduled: Process pool management, 
like input matching and guard execution, is delegated to the Strand implementa- 
tion. The simulation is equivalent in all important respects (except efficiency) to 
direct execution. The same values are generated for variables and run-time errors 
are signaled in the same manner. To illustrate the differences between direct and 
simulated execution, consider the process: 


reduce(member(2,[1,2,3],R)) 
This computes the result R = true, as does direct execution of the process: 
member(2,[1,2,3],R) 


Table 9.2 shows example process pools when executing and interpreting the pro- 
cess member(2,[1,2,3],R). Each line represents a step in the interpretation. 


Table 9.2: Simulated and Direct Execution 
Simulated Execution Direct Execution 


reduce(member(2,[1,2,3],R)) member(2,[1,2,3],R) 
rule(member(2,[1,2,3],R),B), reduce(B) 

B := member(2,[2,3],R), reduce(B) 

reduce(member(2,[2,3],R)) member(2,[2,3],R) 
rule(member(2,[2,3],R),B1), reduce(B1) 

B1 := true, reduce(B1) 

reduce(true) 

empty and R = true empty and R = true 


9.2.2 A Tracing Interpreter 


Program 9.3 is not particularly useful but is interesting because it can easily 
be extended to produce enhanced interpreters. These not only execute pro- 
grams but also perform additional functions. For example, consider Program 9.4 
which differs from Program 9.3 in two respects. First, an additional process, 
display({Process,Body},D), has been added to the fourth rule. This requests the 
Strand system to display the term {Process,Body}. Second, an additional argu- 
ment is threaded through the process structure. This is used to ensure that the 
order in which terms are displayed corresponds to the partial order in which re- 
ductions are performed. Thus, body processes are displayed after the process that 
spawned them. 

Program 9.4 can be used to simulate the execution of programs in the same 
way as Program 9.3. Interpretation is again equivalent to direct execution, except 


226 Chapter 9. Metaprogramming 


-exports([reduce/1]). 
reduce(P) :— reduce(P,[ }). 


reduce(true, _). 

reduce((P1,P2),D) :— 
reduce(P1,D), reduce(P2,D). 

reduce(Process,[]) :— 

Process =\= true, Process =\= (.,-) | 
rule(Process,Body), 
display({Process,Body},D), 
reduce(Body,D). 


rule(member(E,[E|-],R),B) :— B := (R := true). 
rule(member(E,[E1|Es],R),B) :— 

E =\= E1 |B := member(E,Es,R). 
rule(member(-_,[],R),B) :— B := (R := false). 
rule(X := Y,B) :— assign(X,Y,D), done(D,B). 


done([],B) :— B := true. 


Program 9.4: A Tracing Interpreter 


that additional display processes are now created at each simulated reduction. 
Each process selected for reduction by the interpreter is therefore displayed on the 
screen. For example, execution of the process: 


reduce(member(3,[1,2,3],R)) 
generates the following output: 


{member(3,[1,2,3],R), member(3,[2,3],R) } 
{member(3,[2,3],R), member(3,[3],R) } 
{member(3,[3],R), R := true} 

{R := true, true} 


Each line represents a process reduction and shows the process being reduced and 
the process(es) to which it is reduced. A minor extension to the original inter- 
preter has hence yielded something useful: a simple execution tracer. A variety of 
other useful extensions are suggested in the exercises. 


9.3. Source-to-Source Transformations 227 


Writing interpreters: 


e Interpreters are programs that 
manipulate programs. 


e Interpreters are developed, 
refined and combined in the 
same way as other programs. 





9.3 Source-to-Source Transformations 


The techniques used to analyze programs can be adapted to yield source-to-source 
transformations. An analysis program is simply a program that inspects trees; 
a source-to-source transformation takes a tree as input and generates a new tree 
as output. We consider two example source-to-source transformations in this sec- 
tion. The first takes process definitions as input and generates a definition for 
the rule process used by the tracing interpreter. The second takes a program and 
adds a short-circuit; this makes it possible to detect when the original program 
terminates. 

For convenience, we choose here to represent process definitions in a prepro- 
cessed form for the purpose of program transformations. A process definition is 
represented by a term of the form: 


{ ProcessName/Arity, Rules } 
where Rules is a list of terms, each of which is structured as: 
{ Head, Guard, Body } 


Guard and Body are lists of terms representing guard tests and processes respec- 
tively. Figure 9.2 illustrates the representation of the member process using this 
format. 


9.3.1 Generating the Rule Form 


The rule form required by the tracing interpreter in Section 9.2.2 represents each 
rule in an original program by a new rule. For example, recall the member process 
definition: 


member(E,[E|-],R) :— R := true. 
member(E,([E1|Es],R) :— E =\= E1 | member(E,Es,R). 
member(_,[],R) :— R := false. 


This is expressed in rule form as: 


228 Chapter 9. Metaprogramming 


{member/3, 
[ 
{member(’_var’(’E’),[’_var’(’E’)|’-var’('’)],’-var’(’R’)), 
[], 
[-var'( R’) := true]}, 
{member(’_var’(’E’),['-var’(’E1’)|’_var’('Es’)],’-var'(’R’)), 
-var (E) =\='_var'('E1’)], 
[member(' -var ('E'), var ('Es'), var’ ('R'))]}, 
{member(’_var’(_),[],’-var'(’R’)), 
[], 
['_var’(’R’) := false]} 


Figure 9.2: Preprocessed process definition 


rule(member(E,[E|-],R),B) :— B := (R := true). 
rule(member(E,[E1|Es],R),B) :— E =\= E1 |B := member(E,Es,R). 
rule(member(-,[],R),B) :— B := (R := false). 


Program 9.5 implements this transformation; it takes as input a process definition 
in preprocessed form and generates rules for the new process definition in the same 
form. 


-exports([form/4]). 


form({_,Rs},VN,Rb,Re) :— % R1 
rules(Rs,VN,Rb, Re). 


rules([{H,G,B}|Rs],VN,Rb,Re) :— % R2 
Rb := [{rule(H,{V,VN}),G.[{V,VN} := B]}|Rm], 
V := ' var, 
rules(Rs,VN,Rm, Re). 

rules([],-,Rb,Re) :— Rb := Re. % R3 


Program 9.5: Rule Transformation 


A process definition is transformed by transforming all of the rules (R1). A 
rule H :—G |B is transformed to a new rule rule(H,VN) :— G |VN := B (R2). The 
variable name VN is provided as an argument to the transformation; this must not 
occur elsewhere in the process definition. 


9.3. Source-to-Source Transformations 229 


After careful scrutiny, the reader might consider rewriting the second clause in 
Program 9.5 as: 


rules({[{H,G,B}|Rs],Rb,Re) :— 
Rb := [{rule(H,{’_var’, VN}),G,[{’_var’, VN} := B]}|Rm], 
rules(Rs,Rm,Re). 


However, this prevents the transformation from being used on itself. The reason 
is that the structure {’_var’,VN} will be interpreted as a variable rather than a 
representation of a variable. 


9.3.2 Adding a Short-Circuit 


We now consider a more complex transformation: adding a short-circuit. Fig- 
ure 9.3 illustrates this transformation by showing the member process definition 
both before and after transformation. 


(a) Original program 
member(E,[E|_],R) :— R := true. 
member(E,{E1|Es],R) :— E =\= E1 | member(E,Es,R). 
member(_,[],R) :— R := false. 


(b) Program with short-circuit 
member(E,[E|-],R,Lc,Rc) :— assign(R,true,D), link(D,Lc,Rc). 
member(E,[E1|Es],R,Lc,Rc) :— E =\= E1 | member(E,Es,R,Lc,Rc). 
member(-,[],R,Lc,Rc) :— assign(R,false,D), link(D,Lc,Rc). 


link([],Lc,Rc) :— Re := Le. 


Figure 9.3: Adding a Short-Circuit 


Program 9.6 implements the transformation and is shown in full as it provides 
a starting point for many similar transformations. The program takes as input a 
process definition in the preprocessed form and generates a new process definition. 
It also takes as input a list of unique variable names Vs for the short-circuit; 
for simplicity, we assume that this list contains a sufficient number of names to 
complete the transformation. 

Each rule in the original process definition is transformed in turn (R1). Trans- 
forming a rule involves transforming its head and body (R2). Transforming the 
head involves adding two additional arguments L and R taken from the list Vs 
(R4,5). Transforming the body involves threading a short-circuit through the 
body processes. The short-circuit is constructed using variables from Vs; it begins 
at L and ends at R (R7-8). An empty body is transformed to a process that closes 
the short-circuit (R9). 


230 Chapter 9. Metaprogramming 


-exports([Sc/3]). 


sc({Nm/A,Rs},Vs,NP) :— % R1 
NP := {Nm/NA,NRs}, NA is A+2, rules(Rs,Vs,NRs). 


rules([{H,G,B}|Rs],Vs,Rs1) :— % R2 
Rs1 := [{NH,G,NB}|Rs2], 
change(H,Vs,NH), body(B,Vs,NB), 
rules(Rs,Vs,Rs2). 


rules({],.,Rs) :— Rs := []. % R3 

change(H,[L,R|_],NH) :— % R4 
string(H) | NH := {H,{V,L},{V,R}}, V :=’_var’. 

change(H,[L,R|_],NH) :— % R5 
tuple(H) | 


length(H,A), A1 is A + 1, A2 is A + 2, 
make_tuple(A2,NH), copy_args(A,H,NH), 
put_arg(A1,{V,L},NH), put-arg(A2,{V,R},NH), 


V := '_var’. 
body(B,[L,R|Vs],NB) :— body(B,NB,Vs,L,R). % R6 
body([X1,X2|Xs],Nb,[M,T|Vs],L,R) :— % R7 
process(X1,Nb,Ne,L,M,T), body([X2|Xs],Ne,Vs,M,R). 
body([X],Nb,[M|Vs],L,R) :— process(X,Nb,[],L,R,M). % R8 
body([],Nb,_,L,R) :— Nb := [{V,L} := {V,R}], V := ‘var’. % R9 
process(X:=Y,Nb,Ne,L,M,T) :— % R10 
Nb := [assign(X,Y,{V,T}), link({V,T},{V,L},{V,M})|Ne], 
V := ' var". 
process(P,Nb,Ne,L,M,-) :— % R11 
otherwise | change(P,[L,M],NP), Nb := [NP|Ne]. 
copy-args(N,Head,AHead) :— % R12 
N>0 | 


get_arg(N,Head,Arg), put_arg(N,AHead,Arg), 
N1 is N — 1, copy-args(N1,Head,AHead). 
copy-_args(0,_,-). % R13 


Program 9.6: Preprocessor that Adds a Short-Circuit 


9.4. Summary 231 


For brevity, only a single predefined process, ‘:=’, is recognized by this program. 
This is translated into an assign process plus a link process. The link process must 
be added to the complete transformed program. It is defined as: 


link([],L,R) :— L := R. 


9.4 Summary 


We write metaprograms to analyze other programs, to transform them, and to 
simulate their behavior. This chapter has demonstrated all three of these uses. In 
Strand, programs can be naturally represented as data. Thus, metaprograms can 
be developed and refined using techniques discussed throughout this book. 


Exercises 


9.1 Write an analysis tool that takes a program as input and generates a new 
program in which all lists are represented as binary trees using tuples. For 
example, the list [a,b|Xs] will be translated to the tree {a,{b,Xs}}. 


9.2 Write an interpreter that provides more powerful tracing facilities than Pro- 
gram 9.4. For example: 


e The user should be able to request that tracing begin after a specified 
number of reductions. 


e A spy-point feature should permit the user to specify that only certain 
processes are to be traced. 


e An interactive facility should permit the user to specify, at each reduc- 
tion, whether a particular process (and its offspring) are to be traced 
or executed freely. 


9.3 Construct representations of Program 9.2 in the styles illustrated in Figures 
9.1 and 9.2. 


9.4 Write a preprocessor that takes a term representing a Strand program (in the 
form illustrated in Figure 9.1) and generates a term of the form illustrated 
in Figure 9.2. 


9.5 Write an analysis program that takes a set of Strand process definitions and 
reports: 
e Processes called but not defined. 
e Processes defined but not called. 


e Singleton variables (variables that occur only once in a rule). 


232 Chapter 9. Metaprogramming 


9.6 Write a source-to-source transformation that replaces all singleton variables 
in a program by the anonymous variable ‘_’. 


9.7 Write a source-to-source transformation that takes a program and generates 


a new program capable of tracing its execution in the manner of Program 
9.4. 


9.8 Write a source-to-source transformation that takes two Strand modules M1 
and M2 and generates a new module M3 that does not contain any inter- 
module calls between M1 and M2. This transformation must ensure that no 
name clashes occur. 


Part III 


Case Studies 





4 


‘ai 
ee! on i 









| 


GLS 
gn 
pere 
` 
e—a 


—— 
een 


A 
C 
aan 


N 


H 
ae, Q 
OS 
Aa A 





Chapter 10 


Reasoning About Equality 


Carl Kesselman 


The Aerospace Corporation 


Los Angeles, California 


Stephen Taylor 


California Institute of Technology 
Pasadena, California 


This chapter considers a non-trivial programming exercise and shows its complete 
solution. The problem, from the domain of formal verification, is a graph algo- 
rithm for proving theorems about the equality of terms. Several lessons are to 
be learned from this exercise. Building on ideas presented in Chapters 4 and 5, 
the program illustrates how a mutable data structure can be implemented using 
perpetual processes. In addition, it shows how various algorithms can be im- 
plemented using message passing. The study highlights the fact that although 
concurrent languages provide a good medium for expressing a problem, they do 
not, in and of themselves, yield effective parallel programs. The hard problems, 
such as synchronization, deadlock, starvation and algorithm design, must still be 
addressed by the programmer. 


10.1 Introduction 


We consider the problem of constructing a parallel algorithm to compute the 
congruence closure of a relation. Our interest in this algorithm derives from its 
importance to the field of formal verification, i.e., the process of mathematically 


235 


236 Chapter 10. Reasoning About Equality 


proving that an implementation is consistent with its specification. 

Verification is used to increase our confidence that critical systems will per- 
form properly. There are a number of examples where this has proved successful. 
The State Delta Verification System (SDVS) [50,52] and the Stanford Pascal Ver- 
ifier [49] have been used to show that a given program behaves consistently with 
its specification. In addition, SDVS has been used to prove that the instruction 
set of the Arpanet packet switching node (IMP) is correctly implemented by its 
microcode. 

Formal verification is a computationally intensive symbolic processing task. 
For example, the IMP proof required days to complete on a Symbolics 3600. We 
would like to examine the feasibility of increasing the performance of verification 
tools. This would allow more complex verification tasks to be undertaken. Since 
these tasks are computationally intensive, parallel processing is a natural direction 
to investigate. 

Reasoning about the equality of terms is a central aspect of verification. For 
example, if a verifier determines that f(f(f(a))) =a and f(f(f(f(f(a))))) =a, it 
must then be able to determine that the assertion f(a) Æ a is inconsistent. Ques- 
tions of this type can be answered using the congruence closure algorithm since it 
provides an efficient decision procedure for a quantifier-free theory of equality. 

Several algorithms for computing the congruence closure have appeared in the 
literature. The algorithm used in this study was originally described by Oppen 
and Nelson [56]; the theoretical lower bound on its time complexity is O(m?), 
where m is a metric of the problem size. An algorithm with lower complexity 
exists (O(m log? m)) [27] but it involves hashing techniques and is more complex 
to implement. In practice, m is typically small and thus the more complex al- 
gorithm does not provide substantial improvement in performance [56] even on 
uniprocessors. 

A cursory inspection of typical closure operations indicates a low granularity 
of computation. Thus, we would not expect parallel execution of the algorithm to 
result in significant performance improvements. However, since congruence closure 
is invoked frequently during the course of a single proof, an increase in throughput 
is likely to be effective. As a result, the goal of parallel execution is to overlap 
many invocations of the algorithm. 


10.2 The Sequential Algorithm 


The congruence closure algorithm manipulates a graph that represents structures 
called terms. Each term is either an atom or a function. A function is comprised 
of a function symbol applied to a fixed number of arguments; each argument is 
itself a term. The number of arguments of a term T is denoted by 6(T). 

The input to the algorithm is a set of equalities between terms. After ensuring 
that the terms are represented in the graph, the congruence closure algorithm 
modifies the graph to reflect the equality relation. During a proof the graph is 
altered with the addition of new terms and new equality relations. 


10.2. The Sequential Algorithm 237 


Functions have no inherent meaning. For example, the term 1 + 2 is only a 
function + with arguments 1 and 2; it is not necessarily equal to 3. Terms are 
represented in the graph as nodes and edges correspond to term/subterm rela- 
tionships. Figure 10.1 illustrates the graph corresponding to the term f(a, g(a, b)). 
The node representing the term g(a,b) is a subterm of f(a,g(a,b)). The node 
f (a, g(a, b)) is said to be a predecessor of the term g(a, b). 


(eaten a= 





Figure 10.1: Graph for term f(a, g(a, b)) 


Every term in the graph belongs to an equivalence class. When new terms 
are added to the graph, they are initially placed in an equivalence class containing 
only themselves. Equivalence classes may be merged in response to explicit equal- 
ity assertions. For example, given the assertion g(a, b) = b the graph in Figure 10.1 
is augmented to indicate that the nodes g(a,b) and b are both in the same class. 
Figure 10.2 indicates the new class created by this assertion. 





Figure 10.2: Term graph with g(a, b) = b 


Two terms, Tı and T2, are said to be congruent if they both have the same 
number of arguments (6(7,) = 6(T2)), their function symbols are equivalent and 
their arguments are pairwise equivalent. If two terms are congruent, then they 
are also equivalent; however, equivalent terms are not necessarily congruent. For 


238 Chapter 10. Reasoning About Equality 


example, if f(a) = g(b), a need not be equivalent to b. 

Two classes are made equivalent by merging their members into a single class. 
Merging may cause predecessors of the class members to become congruent. For 
example, given the graph shown in Figure 10.2, consider the effect of the equality 
assertion: f = g. This causes the nodes corresponding to f and g to be merged 
in a single equivalence class as shown in Figure 10.3. 





Figure 10.3: Term graph with f =g 


Since f = g, the term f(a, g(a,b)) may be re-written as g(a, g(a, b)); however, 
g(a, b) is equal to b thus the term f(a, g(a, b)) is equivalent to both g(a, b) and b. 
The congruence closure algorithm ensures that this new equivalence is explicitly 
represented in the graph. The resulting structure is shown in Figure 10.4. 





Figure 10.4: Term Graph After Equivalence Propagation 


The process of asserting equivalence between two terms can be described by two 
basic algorithms: find and closure. An abstract definition of the find algorithm 
is shown in Program 10.1. It takes a term as input and attempts to determine 


10.2. The Sequential Algorithm 239 


if a node congruent to this term is already in the graph. Atomic terms can be 
located directly using their name. Non-atomic terms are located by using the find 
algorithm recursively. If a congruent node already exists, it must be a predecessor 
of nodes that are congruent to the term’s arguments. Thus, the node can be 
located by searching the predecessors. If an existing node cannot be located, then 
a new node is created. 


find(t) = 
if (6(¢) = 0) then 
if exists(t) then 
return atomic_node(t) 
else 
return make_node(t) 
else 
n := make_node(t) 
for each t; in arguments(t) 
n; := find(t,;) 
for each p in predecessors(n; ) 
if congruent(p,n) then 
return(p) 
return n 


Program 10.1: The Find Algorithm 


The find algorithm uses three auxiliary functions: exists checks to see if an 
atomic node exists; if it does, atomic_node returns it. If a congruent node can 
not be found, make_node is used to create a new node in the graph. In addition, 
the find algorithm uses the congruence test defined in Program 10.2. 


congruent(n,,n2) = 
if (6(n1) Æ 6(N2)) then 
return FALSE 
else 
for each (a),a2) E€ {(argument(i, nı), argument(i,n2)) | 1 < i < 6(n)} 
if (class(a,) Æ class(ag)) then 
return FALSE 
return TRUE 


Program 10.2: The Congruence Test 


Program 10.3 is an abstract definition of a sequential closure algorithm. This 
algorithm asserts that two terms in the graph are equal and propagates necessary 


240 Chapter 10. Reasoning About Equality 


changes throughout the graph. It uses an auxiliary function union that combines 
the members of two equivalence classes. 


closure(u,v) = 

class, := Class(find(u)) 

class, := class(find(v)) 

if (class, 4 class,,) then 
Pu := predecessors(class,,) 
Py := predecessors(class, ) 
union(class,,,class,,) 
for each (x,y) iN py X Py 

if (congruent(x,y)) then closure(x,y) 


Program 10.3: The Closure Algorithm 


10.3 Overview of the Approach 


The preceding discussion indicates that the congruence closure algorithm manipu- 
lates two types of entity: nodes and classes. For convenience, we will also consider 
a predecessor set to be an entity; this corresponds to the set of predecessors of all 
the members of a specific class. 

Both the find and closure algorithms require the ability to determine if two 
nodes are in the same class. Fast comparison of classes can be achieved by assigning 
each class a unique name. The equivalence of two nodes can then be determined 
in constant time by comparing the names of their respective classes. However, 
naming a class is not without problems. 

Nodes can become members of new classes as a result of the union operation 
in a closure. This change of class membership must be reflected as a change in the 
class name associated with all members of the class. We would like this operation 
to take place in constant time. In other programming languages, this would be 
achieved by storing the class name in a variable shared by all members of the 
class. A change of name would be reflected by a modification of the variable. The 
single-assignment rule used in Strand precludes this form of update. However, an 
alternative solution can be employed in which the class name is encapsulated in a 
perpetual process. A change in name can be effected in constant time by sending 
the process a message. 

This approach leads to an process-oriented formulation of the problem along 
the lines described in Chapter 4. Three basic processes correspond to the three 
entity types in the problem: nodes, classes, and predecessor sets. Time-critical 
operations are achieved in constant time by sending messages between these pro- 
cesses. We will now discuss the functionality and implementation of each of these 
processes. 


10.3. Overview of the Approach 241 


Class Processes. Program 10.4 shows the implementation of the class pro- 
cess. A class process represents an equivalence class of nodes in the graph. The 
state encapsulated by a class process includes a list of nodes that are members of 
the class (Ms). Each class is identified by a unique class number (Cn). A stream 
to the class allows members of the class to communicate with it (ls). Movement 
up through the graph is achieved through the predecessors of the nodes of the 
class; these are accessed by a stream to the predecessor set associated with the 
class (Preds). 


make_class(In,Cn,M,Preds) :— % R1 
merger([merge(In)], Is), 
class(Is,Cn,[M|Me]/Me,Preds). 


class([emerge(BNode,L,R)|Is],ACn,AMs,APreds) :— % R2 
BNode := [class(BCn)|BNode1], 
class([cmp_class(BCn,BNode1 ,L,R)|Is],ACn,AMs,APreds). 


class([cmp-_class(Cn,BNode,L,R)|Is],Cn,AMs,APreds) :— % R3 
R := L, BNode := [], 
class(Is,Cn,AMs,APreds). 
class([cmp_class(BCn,BNode,L,R)|Is],ACn,AMs,APreds) :— % RA 
ACn =\= BOn | 
BNode := [union(AMs,APreds, Is,L,R)]. 
Cclass([union(AMb/AMe,APreds,L,R)|Is],BCn,BMb/BMe,BPreds) :— % R5 


BMe := AMb, BPreds := [pred_closure(APreds,L,R)|BPreds1], 
class(Is,BCn,BMb/AMe,BPreds1). 


class([class(N)|Is],Cn,Ms,Preds) :— % R6 
N := Cn, 
class(Is,Cn,Ms,Preds). 
class([],-,-,-). % R7 
class([ToPreds|Is],Cn,Ms,Preds) :— % R8 
otherwise | 


Preds := [ToPreds|Preds1], 
class(ls,Cn,Ms,Preds1). 


Program 10.4: Class Process 


A new class object is created by spawning a class process with an associated 
merger (R1). A class process supports two basic operations: forming the union 
of two classes and obtaining the class identifier. The union operation on two 
classes, A and B, described in Section 10.2 is implemented by the union message. 
This message combines the members of class A with those in class B and invokes 
congruence closure (R5). An incomplete class message provides the class identifier 
to another process (R6). The algorithm for merging and closure results in a 
complex interaction between all process types; the cmp_class and emerge messages 


242 Chapter 10. Reasoning About Equality 


(R2-4) support this algorithm; the details are described in Section 10.4.2. All other 
messages are delegated to the predecessor process associated with the class (R8). 

Unique class identifiers are generated by a process whose function is to gen- 
erate all class processes. Program 10.5 shows the implementation of this process. 
The class_generator ensures that each class obtains a unique identifier by allocat- 
ing numbers sequentially (R2). If the class_generator were to become a bottle- 
neck, other well-known distributed naming techniques could be applied. Finally, 
the auxiliary function arg_class is used to obtain the class identifiers for a set of 
classes (R4-6). It achieves this by sending class messages to each class (R5) and 
constructing a tuple of the replies (R4). 


make_class_generator(l) :— % R1 
merger(l,ls), class_generator(|s,0). 


class_generator([new_class(C,M,P)|Is],Id) :— % R2 
make_class(C,Id,M,P), 
Id1 is Id + 1, 
class_generator(Is,ld1). 
class_generator({],-). % R3 
arg_class(Args,NewArgs,ArgClasses) :— % RA 


length(Args,N), 
make_tuple(N,NewArgs), 
make_tuple(N,ArgClasses), 
arg_class(N,Args,NewArgs,ArgClasses). 


arg_class(N,Args,NewArgs,ArgClasses) :— % R5 
N>0 |N1isN —1, 
put_arg(N,Args,[class(C)|Class1)), 
put_arg(N,NewArgs,Class1), 
put_arg(N,ArgClasses,C), 
arg_class(N1,Args,NewArgs,ArgClasses). 
arg_class(0,_,_,-). % R6 


Program 10.5: Auxiliary Class Processes 


Node Processes. A node process, shown in Program 10.6, implements the 
representation of a term in the graph. A new node is placed in a new equivalence 
class when it is created. This class is created by sending a message to the class 
generator process (Program 10.5). In addition, an empty predecessor process is 
created and associated with the newly created class (R1). Each node encapsulates 
a stream to its class (Class) and a list of streams to nodes that represent the 
arguments of the term (Args); it receives messages on an input stream (Is). 


10.3. Overview of the Approach 243 


make_node(Streams,As,Cg) :— % R1 
merger([merge(S1)|Streams],|s), 
Cg := [new-_class(C,S1,P)|Cg1], 
make_preds(P,Cg1), 
node(Is,C,As). 


node([args(Cns)|Is],Class,Args) :— % R2 
arg_class(Args,Args1,Cns), 
node(Is,Class,Args1). 
node([union(AMs,APreds,Als,L,R)|Bls],BClass,BArgs) :— % R3 
BClass := [union(AMs,APreds,L,R), 
merge(Als)|BClass1], 
node(BlIs,BClass1,BArgs). 


node([],-,-). % R4 
node([ToClass|Is],Class,Args) :— % R5 
otherwise | 


Class := [ToClass|Class1], 
node(ls,Class1,Args). 


Program 10.6: Node Process 


An incomplete args message is used to return the class identifiers of all argu- 
ments of the term represented by the node (R2). Union messages are forwarded 
to the node’s class, but are intercepted so that a new stream to the class can be 
returned for use by the merging algorithm (R3). All other messages are forwarded 
to the node’s class process directly (R5). 

Predecessor Processes. Upward traversal of the graph is achieved through 
the predecessor set associated with a class. Upward traversal is used in both the 
find and closure algorithms. The predecessor process is the destination of most of 
the messages sent during these algorithms. Program 10.7 implements this process. 


The basic operation on a predecessor process is to add a new node to a prede- 
cessor list (R2) and to return a list of streams to predecessor nodes (R3). The find 
algorithm described in Section 10.2 uses the find-congruent message to determine 
if a node is already in the graph. This is achieved by searching the predecessor list 
for a congruent node (R4). The pred-closure messages (R5,6) are used to propagate 
the congruence closure up through the graph as described in Section 10.2. 

A difficulty in using perpetual processes can arise when a data structure such 
as a list is returned by an incomplete message. If the list contains streams to 
other processes, two different processes may gain access to the same stream. If 
both processes attempt to generate the stream, Strand’s single-assignment rule 
would be violated. This problem is eliminated by using the auxiliary function split 
(R8,9) which generates fresh streams to be returned in messages. 


244 Chapter 10. Reasoning About Equality 


make-_preds(I,Cg) :— merger(I,ls), preds(Is,[],Cg). % R1 

preds([add-pred(Pred)|Is],Preds,Cg) :— % R2 
preds(Is,[Pred|Preds],Cg). 

preds([preds(Preds1)|Is],Preds,Cg) :— % R3 


split(Preds,Preds1,[],Preds2,[]), 

preds(ls,Preds2,Cg). 
preds(([find_congruent(T,SubNs,N,L,R)|Is],Preds,Cg) :— % RA 

arg_class(SubNs,SubNs1,SubCns), 

split(Preds,Preds1 ,[],Preds2,[ ]), 

Cg := [merge(Cg1)|Cg2], 

find_congruent(Preds1,SubCns, T,SubNs1,N,Cg1,L,R), 

preds(ls,Preds2,Cg2). 
preds([pred_closure(APs,L,R)|Is],BPreds,Cg) :— % R5 

split(BPreds,BPreds1, Tail,BPreds2,[]}), 

APs := [pred_closure(Tail,BPreds2,L,R)], 

preds(ls,BPreds1,Cg). 
preds([pred_closure(Tail,BPreds,L,R)],APreds,Cg) :— % R6 

Cg := [], 

split(APreds,APreds1,[],Tail,[]), 

closure(APreds1,BPreds,L,R). 
preds([],-,-). % R7 


split([H|Hs],As,At,Bs,Bt) :— % R8 
H := [merge(A)|B], As := [A|As1], Bs := [B|/Bs1], 
split(Hs,As1,At,Bs1,Bt). 

split([],As,At,Bs,Bt) :— As := At, Bs := Bt. % RI 


Program 10.7: The Predecessor Process 


10.4 Implementing the Graph Algorithms 


Nodes are created and assertions are made by sending messages to a top-level 
process called prover. Program 10.8 shows the implementation of this process. 
A prover process begins execution by starting a dictionary process and a class 
generator. The dictionary process indexes input streams to the atomic nodes 
currently in the graph. The class generator is responsible for assigning unique 
class identifiers to newly created classes. 


The prover process services two main messages: find (R2) and equal (R3). 
The incomplete find message invokes the find algorithm described in Section 10.2 
to locate a node in the graph; it returns a stream to this node (N). The equal 
message is used to assert the equality of two terms (R3). This is achieved by 
first finding the nodes representing both terms. The merge algorithm described 


10.4. Implementing the Graph Algorithms 245 


prover(Is) :— % R1 
prover(done,|s,Gs), 
make_class_generator(Cg), 
dict(Gs,|[],Cg). 


prover(done,|find(T,N)|Is],Gs) :— % R2 
Gs := [find(T,N,done,R)|Gs1], 
prover(R,ls,Gs1). 

prover(done,[equal(T1,T2)|Is],Gs) :— % R3 
Gs := [find(T1,N1,done,M),find(T2,N2,M,R)|Gs1], 
prover(R,[emerge(N1,N2)|Is],Gs1). 

prover(done,[emerge(N1,N2)|Is],Gs) :— % R4 
N1 := [emerge(N2,done,R)], 
prover(R,Is,Gs). 

prover(_,[],-). % R5 


Program 10.8: Equality Prover Top Level 


in Section 10.2 is then invoked to form the closure of the equivalence classes of 
the two nodes (R4). Note that a new request is not processed until the previous 
one has completed. This ensures that find and equal messages do not overlap in 
operation; we will discuss this aspect of the program in a later section. 

It is interesting to note that the program presented here cannot be terminated 
by closing off the input stream to prover. This is because there will always be 
circularity in the process structure. This is completely legal in Strand since it 
is a circular process structure, not a circular data structure. Termination can 
be achieved by detecting when all outstanding work is completed and sending a 
termination message to all active processes. 


10.4.1 The Find Algorithm 


The find algorithm is a recursive algorithm used to locate or create a node in 
the graph. The nodes representing atoms reside in a dictionary that relates atom 
names to atomic nodes in the graph. If an atom cannot be found in the dictio- 
nary, a new node is created and the dictionary updated. A node representing a 
nonatomic term is created only if there is no congruent node already in the graph. 
The dynamic behavior of the congruence relation precludes the use of the dictio- 
nary for locating nonatomic terms. The alternative used is to recursively find the 
subterms of a term and search the predecessors of the subterms for a congruent 
node. If such a node does not exist, a new node is added to the graph and this 
new node is added to the predecessor set of the subterms. 

Program 10.9 shows how a find operation is initiated and atomic nodes are 
located. The algorithm is initiated when the dictionary receives a find message 


246 Chapter 10. Reasoning About Equality 


(R1) from the prover (Program 10.8). This message contains the term to be 
located in the graph and a stream to the node representing the term. If the term 
is atomic (R3), the dictionary is searched recursively (R5). If the atom is found 
in the dictionary, the stream in the original find message is merged into the node 
process representing the atom (R6). If the atom is not found, a new node in a 
unique equivalence class is generated and a stream to that node is entered in the 
dictionary (R7). The stream in the find message is merged into the new node. If 
a node is not atomic, the graph is searched for a congruent node (R4). 


dict([find(T,N,done,R)|Is],D,Cg) :— % R1 
Cg := [merge(Cg1)|Cg2], 
find_term(T,N,done,R,D,D1,Cg2), 
dict(Is,D1,Cg1). 


dict({],-,-). % R2 
find_term(T,N,L,R,D,D1,Cg) :— % R3 
string(T) | find_atom(T,N,L,R,D,D1,Cg). 
find_term(T,N,L,R,D,D1,Cg) :— % R4 
tuple(T) | find_tuple(T,N,L,R,D,D1,Cg). 
find_atom(T,N,L,R,[{T1,N1}|D],D1,Cg) :— % R5 
T =\=T1 | 
D1 := [{T1,N1}|D2], find_atom(T,N,L,R,D,D2,Cg). 
find_atom(T,N,L,R,[{T,N1}|D],D1,Cg) :— % R6 
N1 := [merge(N)|N1s], D1 := [{T,N1s}|D], 
Cg :=[], R := L. 
find_atom(T,N,L,R,[],D1,Cg) :— % R7 


D1 := [{T,N1}], R := L, 
make-node([merge(N1)|N],T,[],Cg). 


Program 10.9: Finding Atomic Nodes 


Program 10.10 shows how nonatomic nodes are found in the graph. Before 
adding a nonatomic term to the graph, it is necessary to ensure that a term con- 
gruent to it is not already represented in the graph. This is achieved by locating 
streams to the function symbol and arguments of the term (R2). The arguments 
to a congruent node would have the same equivalence classes as the arguments 
of the term to be found. Furthermore, if a congruent node exists, it must be the 
predecessor of all the argument nodes. For efficiency reasons, only the predeces- 
sors of the first argument are inspected (R8,9). This is achieved by sending a 
find-congruent message to the predecessor of the first argument (R1). The prede- 
cessors are searched for a congruent node (R4,7). If a congruent node is found, 
the stream in the original find message is merged into the node (R6); otherwise, a 
new node is created (R5). 


10.4. Implementing the Graph Algorithms 247 


find_tuple(T,N,L,R,D,D1,Cg) :— % R1 
Index := [find-congruent(T,SubNs,N,M,R)], 
length(T,A), 
make-tuple(A,SubNs), 
find-subterms(A,T,L,M,SubNs, Index,D,D1,Cg). 


find-subterms(A,T,L,R,SubNs,Index,D,D2,Cg) :— % R2 
A>0O| 
A1 is A — 1, Cg := [merge(Cg1)|Cg2], 
get_arg(A, T,SubT), 
put_arg(A,SubNs,SubN), 
index_term(A,SubNO,SubN, Index), 
find_term(SubT,SubNO,L,M,D,D1,Cg1), 
find_subterms(A1,1,M,R,SubNs,Index,D1,D2,Cg2). 
find_subterms(0,_,L,R,_,..D,D1,Cg) :— % R3 
R := L, D1 := D, Cg := []. 


find-congruent([P|Ps],SubCns,T,SubNs,N,Cg,done,R) :— % R4 
P := [args(Cns)|P1], 
find_congruent1(P1,Ps,SubCns, T,SubNs,N,Cns,Cg,done,R). 
find_congruent([],-, ,SubNs,N,Cg,done,R) :— % R5 
add_pred(SubNs,SubNs1,N1,done,R), 
make_node([merge(N)|N1],T,SubNs1,Cg). 


find_congruent1(P,_,SubCns,_,_,N,SubCns,Cg,L,R) :— % R6 
R := L, Cg :=[], P := N. 
find-congruent1 (_,Ps,SubCns, T,SubNs,N,Cns,Cg,L,R) :— % R7 


Cns =\= SubCns | 
find-congruent(Ps,SubCns,T,SubNs,N,Cg,L,R). 


index-term(A,NO,N,-) :— A =\=2 |NO :=N. % R8 
index-term(2,NO,N, Index) :— NO := [merge(Index)|N]. % R9 


Program 10.10: Finding Non-atomic Nodes 


248 Chapter 10. Reasoning About Equality 


Program 10.11 shows how the upward links in the graph are created. When 
a new non-atomic node is created, it has streams to all of its arguments (Pro- 
gram 10.10). However, the arguments must be notified that they have a new 
predecessor. This is achieved by the auxiliary function add_pred (R1-3) which 
sends add_pred messages to each argument. 


add_pred(Args,NewArgs,N,L,R) :— % R1 
length(Args,A), 
make-tuple(A,NewArgs), 
add_pred(A,Args,NewArgs,N,L,R). 


add_pred(A,Args,NewArgs,N,L,R) :— % R2 
tuple(Args), tuple(NewArgs), A > 0 | 
A1 is A — 1, N := [merge(N1)|N2], 
put_arg(A,Args,[add_pred(N1)|SubN1)), 
put_arg(A,NewArgs,SubN 1), 
add_pred(A1,Args,NewArgs,N2,L,R). 
add_pred(0,_,_,.,L,R) :— % R3 
R:=L. 


Program 10.11: Updating the Predecessor Sets 


10.4.2 Adding Equalities 


The graph representing term structures is modified in response to explicit equal- 
ity assertions of the form equal(t,t2). The modifications to the term graph take 
place in two operations: merging and closure. Merging combines the classes of 
ti and tg into a single class. This may result in implicit equalities that are not 
yet represented in the graph. These are made explicit by the closure operation. 
The examples in Section 10.2 illustrate merging and closure in the context of the 
sequential algorithm. 

Recall that when an equality assertion between two terms is made, the find 
algorithm is used to obtain streams to the nodes representing these terms (Pro- 
gram 10.8). We will refer to these nodes as nodes nı and ng. Merging is initiated 
by sending an emerge message to nı. This message contains the stream to ng and 
is forwarded to the class of nı. The class number of n; is obtained and compared 
with that of n2. If they are identical, then these two nodes are already in the same 
class and the classes need not be merged. If the classes are not already equivalent, 
then one of the two classes ceases to exist and yields its state to the other class in 
a union message. All subsequent messages destined for the class that has ceased 
to exist are routed to the new composite class. The code for class merging can be 
found in the description of the various object types. The starting point for this 
chain of actions is the emerge message in Program 10.8. 


10.5. Synchronization Issues 249 


Program 10.12 implements the congruence closure algorithm described in Sec- 
tion 10.2. Having merged two classes, the closure algorithm is invoked (Pro- 
gram 10.7, R6) with the predecessors of the nodes in the merged classes (R1). 
The closure algorithm looks at all pairings of predecessors from the original two 
classes (R1,3). If a congruent pair is found, then the classes are merged by invok- 
ing the merge operation recursively (R6). This is achieved by sending an emerge 
message to one of the nodes. If a pair is not congruent, or already equivalent, 
nothing need be done (R5,7). 


closure([AP|APs],BPs,done,R) :— % R1 
AP := [class(ACn), args(ACns)|AP1], 
split(BPs,BPs1,[],BPs2,[]), 
closure1(BPs1,AP1,ACn,ACns,done,M), 
closure(APs,BPs2,M,R). 


closure([],-,done,R) :— % R2 
R := done. 
closure1 ([BP|BPs],AP,ACn,ACns,done,R) :— % R3 


BP := [class(BCn),args(BCns)|BP 1], 
AP := [merge(AP1)|AP2], 
closure2(AP1,ACn,ACns,BP1,BCn,BCns,done,M), 
closure1(BPs,AP2,ACn,ACns,M,R). 

closure1([],AP,_,,,done,R) :— % R4 
R := done, 
AP := []. 


closure2(AP,Cn,_,BP,Cn,_,L,R) :— % R5 
AP := [], BP :=[], R := L. 
closure2(AP,ACn,Cns,BP,BCn,Cns,L,R) :— % R6 
ACn =\=ECn | 
AP := [emerge(BP,L,R)]. 
closure2(AP,ACn,ACns,BP,BCn,BCns,L,R) :— % R7 
ACn =\=BCn, ACns =\=BCns | 
AP :=[], BP :=[], R := L. 


Program 10.12: Closure Algorithm 


10.5 Synchronization Issues 


The sequential algorithm, although simple, involves a number of subtle critical 
sections. These only became apparent as a result of attempting to implement 


250 Chapter 10. Reasoning About Equality 


the algorithm in Strand. The first critical section appears in the find algorithm. 
When inspecting the predecessor list, it is necessary to ensure that none of the 
equivalence classes involved in the congruence check are modified. In addition, it 
is necessary to ensure that between the time the decision has been made to add 
a node and the time that the node is actually added, the graph does not change 
in a manner that would invalidate the decision. In the closure algorithm, it is 
necessary to mutually exclude processes from using classes that are being merged. 

Even though Strand is a concurrent programming language, we must be careful 
to ensure that these critical sections are implemented correctly. The implemen- 
tation of the closure algorithm shown in this chapter maintains correct operation 
by serializing find and closure operations; this is accomplished by the top level of 
the theorem prover (Program 10.8). A short-circuit is threaded through the pro- 
gram so that only one request is active at a time. As a result, no speedup can be 
expected from this implementation. Although the program itself is a concurrent 
formulation, it implements a sequential algorithm. 

This aspect of the implementation emphasizes an important facet of concurrent 
languages: A concurrent semantic model does not, in itself, solve the difficult 
problems associated with algorithm design. The use of a concurrent semantics 
eases the task of specifying concurrent programs and gives a degree of freedom to 
the compiler writer. 

A brief examination of the algorithm implemented shows that complete se- 
rialization is overly restrictive. In fact, we need only ensure that simultaneous 
updates to the same equivalence class are prevented. We are currently investigat- 
ing an algorithm that extends the work presented here to allow concurrent access 
to the graph. In addition, it allows the recursive merge operations resulting from 
a single closure to proceed in parallel. We believe that this constitutes the mini- 
mal constraint on concurrency within the original algorithm and that it will allow 
graph accesses to be pipelined. 

Synchronization in the new algorithm is achieved by locking sets of equivalence 
classes to allow atomic modifications to the graph. This can be implemented in 
Strand using monitors as described in Section 5.1. The algorithm must be carefully 
designed and implemented to circumvent the problems of deadlock and starvation. 


10.6 Conclusions 


This chapter presents an implementation of the congruence closure algorithm. 
It demonstrates how perpetual processes can be used to represent dynamically 
changing graphs. The program extends the techniques presented in Chapter 4 to 
a non-trivial problem and illustrates the deeper issues in concurrent programming. 
We have seen an apparently innocuous sequential algorithm become a reasonably 
sophisticated concurrent one, exposing problems such as mutual exclusion, dead- 
lock prevention and starvation. We observe that the language does not, in itself, 
solve the important problems; however, it does provide insight into where the 


10.6. Conclusions 251 


problems lie and offers simple ways to express solutions. In conclusion, algorithm 
design should be the focus of attention in parallel programming, not the particular 
aspects of the programming language. 





Chapter 11 


Aligning Genetic Sequences 


Ralph Butler, Tracye Butler, Ian Foster, 
Nicholas Karonis, Robert Olson, Ross Overbeek, 
Nathan Pfluger, Morgan Price, Steve Tuecke 


Mathematics and Computer Science Division 
Argonne National Laboratory 
Argonne, IL 60439 


11.1 Introduction 


The incredibly rapid progress in molecular biology is now making headlines in ma- 
jor newspapers. Advances are reported on a weekly basis. The growing interest 
in molecular biology will inevitably make it one of the more important applica- 
tion areas in computer science. Currently, computers play a secondary role but 
projected demands for computation are enormous. Let us give a few examples. 

Genbank (one of the most widely distributed databases containing known se- 
quences) contains just over 20,000 entries. This number has been doubling about 
every 15 months, which might well seem like rapid growth. However, in order to 
take on a project like sequencing the genetic material for an advanced organism 
such as a human, the database will have to grow (fairly rapidly) to the point where 
it can absorb over a million new sequences each day. 

Most of the information that one wishes to include in such a database is cur- 
rently either uncertain or unknown. For example, the three-dimensional structure 
of proteins (information that is critical for many applications) is known for only 
an extremely small fraction of the sequenced proteins. This means that as data 
is added to the database, one would like to attempt to infer speculative data con- 
cerning closely related sequences. Specifically, if the structure of one sequence 


2935 


254 Chapter 11. Aligning Genetic Sequences 


were discovered, it would become possible to make intelligent guesses concerning 
corresponding sequences in closely related organisms. 

Since one is interested in sequences that are closely related, a common query 
involves searching the database for sequences that are genetically similar. Cur- 
rently, such a search requires a modest amount of computing resources (a single 
search frequently takes on the order of two hours of VAX time). When one con- 
siders the issues involved with scaling up both the size of the database and the 
rate of requests by as much as five orders of magnitude, the problems become 
challenging. 

It seems possible that the computational aspects of molecular biology will be 
so interesting and so intense that much of the basic research in this area will 
be completely dependent on the construction and maintenance of an integrated 
database. 

One of our goals is to aid in creating such a database, along with the tools 
that would allow effective access to the stored information. We are working to 
establish a software environment and an actual database that satisfy the following 
objectives: 


1. The implementation must be efficient in the sense that it must support loads 
of the sort projected above. 


2. It must be easy to extract data from the database and to experiment with 
algorithms to utilize the data in unpredictable ways. 


3. We should be able to easily exploit advances in hardware environments that 
would allow substantial improvements in basic capabilities. In particular, it 
seems likely that we will wish to exploit a range of multiprocessing configu- 
rations as such systems are introduced into the commercial marketplace. 


Given our desire to fulfill these objectives, a technology based on the use of con- 
current logic programming, with some of the major computationally-intensive pro- 
gram components written in C, seemed appropriate. To explore the suitability of 
such an approach, we have investigated its use in attacking a prototypical compu- 
tational problem, the problem of aligning a set of sequences of genetic material. 


11.2 The Problem 


For the purposes of this discussion, a sequence of genetic material may be thought 
of as a string of characters from some fairly small alphabet. In particular, we will 
take most of our examples from sequences of RNA, which amount to strings from 


the alphabet {a,c,g,u}. For example, the following are typical short sequences of 
RNA: 


augcgagucuauggcuucggccauggcggacggcucauu 
augcgagucuaugguuucggccaugecggacggcucauu 
augcgagucuauggacuucggccauggcggacggcucagu 
augcgagucaaggggcucccuugggggcaccggegcacggcucagu 


11.2. The Problem 295 


The reader should note that these sequences are similar, but not quite identical. 
In fact, they represent corresponding pieces of genetic material from four distinct 
(but closely related) organisms. There is a great deal that can be learned from 
such related pieces of genetic material. One critical operation in extracting this 
information involves aligning the sequences. An alignment is created by lining up 
the sequences with corresponding sections directly above one another. To make 
corresponding sections line up, dashes are inserted into the sequences. These 
dashes are called indel characters, since they represent areas in which insertions 
or deletions of characters are required to match up the corresponding sections of 
the sequences. For example, the following is an alignment of the four sequences 
given above: 


augcgagucuauggce — — —- —uucg- —- —- -gccauggcggacggcucauu 
augcgagucuauggu----uucg----gccauggcggacggcucauu 
augcgagucuauggac--- uucg----gccauggcggacggcucagu 


augcgaguc-aagg gg cuc ccu u gg gggca ccggcgcacggcucagu 


The application discussed in this chapter uses a Strand program to automatically 
generate such alignments. The sets that we align will normally contain from 2 to 50 
sequences; the individual sequences will contain between 10 and 2000 characters. 


11.2.1 Why Create Alignments? 


A great deal can be learned by studying similarities and differences in sets of 
sequences. For example, the sequences that we have worked with are known 
to come from corresponding genetic material in individuals from closely related 
species. An alignment of these sequences allows a biologist to extract a fairly 
accurate guess about how the organisms relate in the tree of evolution. 

Suppose that the sequences come from individual members of the same species 
(rather than from a variety of species). One might wish to search for genetic 
differences that relate to observable differences in the individuals. This might 
allow the differences that cause certain diseases to be isolated. Again, aligning the 
sequences might reasonably be considered the first step. 

Finally, consider the case where a biologist has just produced a given sequence 
in his laboratory, and hypothesizes that it represents genetic material that per- 
forms some well-defined function; let us call it a “widget.” The biologist might 
well search through a growing database of all known genetic sequences in the hope 
of finding occurrences of similar widgets. Once a set of such sequences has been 
extracted, an alignment might be used to illustrate the exact variations on what 
is believed to be a common theme. 


11.2.2 What Is a Correct Alignment? 


Before describing an algorithm to generate alignments, let us consider the issue 
of exactly how one might determine whether or not such an algorithm produces 


256 Chapter 11. Aligning Genetic Sequences 


“good” alignments. There are at least three reasonable positions that can be 
taken: 

Structural Standard. Sequences really represent molecules or pieces of 
molecules. These molecules usually have a fairly well-defined three-dimensional 
structure. That is, they fold into a characteristic shape. Suppose that all the 
molecules represented by a set of sequences have roughly the same shape. Then, 
it would make sense to carefully match up the corresponding sections. It’s true 
that some might have unique bumps, and some might be missing sections alto- 
gether, but the essential structural theme might be apparent enough to allow a 
meaningful assignment of correspondence. In such cases, the notion of “correct 
alignment” might well be based on a structural standard. Biologists have access to 
a great deal of physical data accumulated through years of experiments designed 
to reveal structural information. 

Genetic Distance Standard. A second approach might be based on defining 
some notion of genetic distance. One sequence can be transformed into another 
by changing individual characters, inserting characters, and deleting characters. 
While there are infinitely many ways to perform such a transformation, it is pos- 
sible to meaningfully define the notion of a “minimal” set of operations. If one 
then hypothesizes that evolution would most likely have occurred through such a 
minimal path, then a “correct alignment” should depict just such a minimal set 
of operations. We call such a view the genetic distance standard. 

Operational Standard. One can simply take alignments that have been 
carefully produced by biologists (frequently taking several years) and call them 
“correct.” Clearly, they reflect the myriad of considerations that really are weighed 
by the practicing biologist. This simplistic view does not allow one to make any 
judgment at all about alignments produced automatically. However, it can be gen- 
eralized to say something like “Any alignment that a competent biologist asserts 
is correct should be viewed as correct.” We call this the operational standard. 


We have adopted the operational standard. This has obvious drawbacks, since 
competent biologists do, in fact, argue over which of two alignments for the same 
set of sequences is correct. In such cases, we are perfectly content to call both 
alignments “correct” and let them both stand (until further information leads the 
community of biologists to rule one out). 

Our goal was to write a program that would take as input a set of sequences 
and produce as output a correct alignment. While not completely successful (our 
alignments do differ somewhat from those produced by expert humans), the pro- 
gram does produce alignments that have been judged by a competent biologist to 
be substantially better than other automatically generated alignments. 


11.3 Our Alignment Algorithm 


A fair body of literature has been generated about how to align two sequences. 
Most of it has been based on the genetic distance standard and is based on dynamic 
programming algorithms. A readable introduction can be found in Sankoff and 


11.3. Our Alignment Algorithm 207 


Kruskal [66] (see, in particular, Chapter 2). A straightforward generalization of 
these approaches to larger sets of sequences is computationally too expensive (for 
m sequences of length n, the algorithm is O(m”)). Hence, a variety of other 
approaches has been tried (for a summary, along with pointers to the relevant 
literature, see von Heijne [36]). Our experience has been that these approaches 
are viewed as useful to biologists, but that the results differ substantially from 
alignments produced by biologists manually. 

Our algorithm is based on the notion of critical subsequences. A critical 
subsequence (of a single sequence) is a short string of characters (say, 8 to 20 
characters in length) that occurs within a sequence and is “not at all similar to” 
any other string that occurs within the same sequence. For the purposes of this 
discussion, we will say that two strings are “similar” if they differ in less than 30% 
of their characters, and a string is a critical subsequence if it is not similar to any 
other string that occurs within the sequence. 

Suppose that a string C is a critical subsequence of two sequences, S1 and S82. 
If we think of S1 and S2 as being genetically related, it seems highly likely that 
the two occurrences of C must line up exactly in the final alignment. Otherwise, 
there must exist a C1 in S1 and a C2 in S2, such that C aligns with each of these 
sequences: 


However, the genetic distance between these corresponding pieces of the alignment 
are fairly large. While it is quite possible that a section of genetic material like C 
will be displaced during the process of evolution, it is quite rare. Hence, we view 
the presence of identical critical subsequences as extremely important clues as to 
how the final alignment should appear. 

When a critical subsequence occurs in two or more sequences, we call the set 
of occurrences a pin. Our algorithm will attempt to create an alignment in which 
aS many pins as possible align exactly. Indeed, our analysis shows that, for the 
alignments produced by biologists, the strings in pins do (almost always) line up. 
However, occasionally incompatible pins are detected. In some cases, it appears 
that a section of genetic material has been “moved” over a substantial distance. 
It is important that such inconsistencies be detected and eliminated; we call such 
a procedure “cleaning a set of pins.” If we consider two sequences as lines, and 
we view pins as arcs connecting points on the lines, then inconsistent pins are 
detected by looking for arcs that cross. By carefully removing a minimal number 
of arcs, we produce a set of pins that form a consistent set of constraints. The 
complete algorithm can be stated more precisely as follows: 


1. Compute a set of clean pins for the sequences to be aligned. Locate the “best 
pin”, which is a pin containing critical subsequences from the largest num- 
ber of input sequences. If several pins contain the same number of critical 
subsequences, choose the one closest to the middle of the input sequences. 


258 Chapter 11. Aligning Genetic Sequences 


2. Reorder the sequences so that all sequences connected by the best pin occur 
above those that are unpinned. Now the pin may be thought of as dividing 
the original set of sequences into three regions as shown in Figure 11.1: The 
section of the pinned sequences to the left of the pin, the section to the right 
and the set of unpinned sequences. 


3. Align the left, right and unpinned sections. 


4. Integrate the pinned and unpinned alignments. 


Left Right 
Pinned Pinned 





Figure 11.1: Splitting Sequences Using a Pin 


Steps 1 and 2 split the original problem into three (hopefully) much smaller sub- 
problems, which are solved in step 3. Since these smaller problems have exactly 
the same structure as the original problem, it might seem that recursion would be 
appropriate. Indeed, it can be used effectively, as subsequences may be critical in 
one of the small problems while not being critical in the original problem. 

There are numerous unspecified aspects of the above approach. For example, 
problems eventually become so small that no pins exist between sequences. One 
of the dynamic programming algorithms must then be employed. Another major 
issue is how precisely to achieve the last step. These issues are beyond the scope 
of this case study. 


11.4 Strand as an Implementation Vehicle 


A variety of considerations influence our choice of an implementation strategy. 
First, there is the major decision of whether to use a relatively high-level language 
(like Strand or Lisp) or a lower-level language (like C). The tradeoffs are well- 
known and hotly debated. High level languages offer the ability to rapidly alter 
algorithms and experiment with variations; lower-level languages offer improved 
performance. 

Early experiments convinced us that performance might well prove to be an 
important issue. Versions of our algorithms (used on sets of 40 to 50 sequences, 
each of which contained 1500 to 2000 characters) written in C consumed in excess 


11.5. Developing the Bilingual Program 259 


of five hours of processing time on a Sun 3/160 workstation. The computation of 
critical subsequences, in particular, has been studied by other researchers [48]; it 
is a computationally-intensive operation that can consume substantial processor 
and/or memory resources. Although we cannot precisely quantify the relative 
costs of doing such an operation in a higher-level language versus C, it seems 
likely that the ratio of execution times would be in the range of 5 to 10 using 
existing implementations. 

The advantages of using a higher-level language for ease of alteration and even- 
tual exploitation of multiprocessors also became apparent. Because the number 
of alternatives that all require evaluation seems quite large, the time between the 
proposal of an algorithm and the completion of its implementation is of critical 
significance. One wishes to be able to formulate conjectures and test them as 
rapidly as possible. 

Our decision to adopt a bilingual approach, with the upper levels in Strand 
and a limited set of kernels in C, reflects our very subjective reaction to the above 
constraints. It is not necessarily critical that we have optimal performance on a 
program that computes alignments. After all, if the algorithm produces correct 
output, it replaces weeks or months of human effort. However, the problem of 
computing alignments is only one of a number of computational problems facing 
biologists. Many of these other tasks (such as searching a fairly large set of se- 
quences for those that are similar to a given sequence) are both computationally 
intensive and occur often. We anticipate a situation arising within a few years in 
which thousands of such requests will have to be handled daily. In that environ- 
ment, performance will definitely be an issue. 

The kernel operations that we implemented in C were fairly limited. By far the 
most effort went into writing routines to compute critical subsequences, to form 
pins, and to implement a dynamic programming algorithm. The latter routine is 
used to align two individual sequences in which no pins constrain the alignment. 


11.5 Developing the Bilingual Program 


The bilingual alignment program represented a development of an earlier program 
written entirely in C. In the process of developing the bilingual program, we refined 
both the implementation of the low-level routines and the top-level algorithm. This 
refinement process was aided by the existence of a clear specification in a high-level 
language. 

A major concern when designing a bilingual program is to achieve a clean 
separation between the two layers of code. A first step in this direction is to 
identify additional abstract data types that must be implemented. In our case, a 
single data type sequence was required. A sequence is thought of as a string of 
characters that come from some specified organism and have an attached location 
number. For example: 


aagcgc from homo sapiens at 1437 


260 Chapter 11. Aligning Genetic Sequences 


This might be used to represent a short sequence of genetic material from a human. 
If there are embedded indels, the location is thought of as applying to the first 
non-indel that occurs in the sequence. For example: 


-aa-gcgc from homo sapiens at 1437 


This might represent a sequence produced during an alignment; in this case, the 
location 1437 applies to the first “a” that occurs in the sequence. With these 
comments in mind, a sequence is composed of 


1. An identifier. 


2. A location that specifies the location in an input sequence of the first non- 
indel character. 


3. A length (the number of characters in the sequence). 


4. A string of characters that make up the sequence. 


The sequence could have been represented in Strand as a user-defined data 
type. However, we found it more convenient to represent it as a tuple with the 
form: 


{ Identifier, Location, Length, String } 


This made it possible to define most basic operations on sequences, such as “re- 
trieve the length of a sequence,” as Strand processes. We also needed to provide 
a small set of user-defined operations to perform other operations on sequences. 
These are described here in full to emphasize how few operations were required. 
The following set of user-defined operations performed basic manipulations. 


read_sequence-_set(File,ListOfSequences) reads a set of sequences from 
a file and constructs a list of sequences. 


extract(String, Start, Length, SubString) retrieves the SubString of String spec- 
ified by Start and Length. 


char_in-sequence(String,Disp,Char) extracts a single character at a dis- 
placement of Disp into String. 


We also required the following more complex functions that manipulate and create 
sequences. 


combine_alignments(Align1 ,Align2,Alphabet, Output, Distance) takes two ex- 
isting alignments and produces an Output alignment using a specified 
Alphabet; also generates a measure of the Distance between the two 
alignments. 


11.6. Using Multiprocessors 261 


critical points(Seq,CPs) determines the critical subsequences of a given 
sequence. Seqs specifies the input sequences, and CPs is assigned a 
list of critical subsequences. 


form_pins(CPList,PinList) constructs a set of pins from the critical subse- 
quences that occur in a set of sequences. CPList is a list, each element 
of which is a list of critical subsequences from a single sequence (as 
produced by critical_points). PinList is assigned a list of pins. Each pin 
is a list of critical subsequences from distinct input sequences. 


glue(ListOfAlignedChunks,ConcatenatedAlignment) concatenates a set of 
alignments. 


strip_indels(Sequence, SequenceWithoutindels) removes indels from a se- 
quence. 


The implementation of these operations required about 2000 lines of C code. The 
size of this code may have to be expanded slightly and we may well implement 
alternative versions of some of the critical operations. However, it appears likely 
that the bulk of future development will be in the Strand code, which currently 
consists of about 600-700 lines. 


11.6 Using Multiprocessors 


The ability to implement a database (and the assorted tools required to use it 
effectively) will hinge on exploiting parallel computers. Hence, we are interested in 
investigating the effectiveness of Strand as a vehicle for coordinating a distributed 
computation. 

Multiprocessors are likely to be important in genome projects in two ways: 
First, they may be used to perform searches against databases of thousands or 
millions of sequences. Second, they may be used to speed up particular time- 
consuming computations involving a small number of sequences. We envisage that 
both applications will be important: In a typical scenario, a scientist will perform 
a simple search against a large database to retrieve a small set of sequences and 
will then perform more sophisticated analysis on these sequences. 

It is a fairly simple exercise to reduce database search times using a multi- 
processor. In the absence of appropriate indexes, this type of search currently 
involves comparing a given sequence with each entry. Partitioning and distribu- 
tion are straightforward. Parallel execution of a single computationally-intensive 
program can be substantially more difficult. We may have little information about 
how best to define and construct subtasks; in addition, irregularities in the prob- 
lem may lead to subtasks varying widely in number and size. Data-dependencies 
between tasks can also cause difficulties. We hence chose to investigate the use 
of Strand as a tool for executing a program of this latter type: our alignment 
program. 


262 Chapter 11. Aligning Genetic Sequences 


The rest of this case study outlines the Strand alignment program, describes 
the techniques used to execute it on a parallel computer and presents the results 
of a preliminary performance study. 


11.6.1 The Strand Program 


Program 11.1 implements the top level of the alignment algorithm presented in 
Section 11.2.2; user-defined operations are labeled. 


align_chunk(Chunk,AlignedChunk) :— % R1 
pins(Chunk,BestPin), 
divide(Chunk,BestPin, AlignedChunk). 
pins(Chunk,BestPin) :— % R2 
cps(Chunk,CpList), 
form_pins(CpList,PinList), % User 
choose_best_pin(Chunk, PinList,BestPin). 
cps([Seq|Sequences],CpList) :— % R3 
CpList := [CPs|CpList1], 
critical_points(Seq,CPs), % User 
cps(Sequences,CpList1). 
cps([],CpList) :— CpList := []. % R4 
divide(Seqs,BestPin,Algnmnt) :— % R5 
BestPin =\=[] | 


split(Seqs, BestPin,Left, Right, UnPinned), 

align_chunk(Left,LAlgnd), 

align_chunk(Right, RAlgna), 

align_chunk(UnPinned,UnPAlgnd), 

combine(LAlgnd,BestPin, RAlgnd, UnPAlgnd,Algnmnt). 
divide(Seqs,[],Algnmnt) :— % R6 

basic_align_chunk(Segs,Algnmnt). 


Program 11.1: Alignment program top level. 


The align_chunk process aligns a set of sequences (a chunk) by attempting 
to split the chunk using a pin. This splitting yields three chunks: left and right 
pinned chunks and an unpinned chunk. These are aligned independently and the 
three subalignments are combined to produce the complete alignment (R5). At 
each recursive call the algorithm computes critical points for each sequence in the 
chunk (R3,R4), forms a clean set of pins using form_pins and selects one of these 
as the best pin (R2). 


11.6. Using Multiprocessors 263 


As noted previously, the alignment problem is sufficiently computationally in- 
tensive to benefit from parallel execution. For example, aligning a relatively small 
test data set required about 50 minutes on a single computer of an Encore Multi- 
max. Furthermore, the divide-and-conquer strategy employed in our algorithm is 
naturally suited for parallel evaluation: Each sub-alignment that results when a 
chunk is partitioned can potentially be performed on a different computer. How- 
ever, parallel execution is not straightforward because the alignment problem has 
an irregular structure. The number and size of processes created is totally data- 
dependent, cannot easily be predicted from the input data, and varies considerably 
from one problem to another. 

We addressed these difficulties by implementing a scheduler that supports a 
load-balancing strategy. As shown in Chapter 8, a Strand program must be modi- 
fied to execute in conjunction with a scheduler. The techniques we used to achieve 
this modification have some attractive qualities. We could have decided on a parti- 
tioning for our program and transformed it by hand (as was done with the Triangle 
program in Chapter 8). However, initial experiments convinced us that it was im- 
portant to be able to rapidly express and evaluate a range of possible partitioning 
strategies. Hence, we developed source-to-source transformation tools that could 
be applied to a program to automatically generate a program that exploited a 
particular strategy. Two such tools were developed. The first requires the pro- 
grammer to specify which processes are to be passed to the scheduler. The second 
determines this automatically using compile-time analyses. The design and im- 
plementation of these tools is beyond the scope of this case study. However, the 
ease with which they could be developed is in our opinion a significant advantage 
of Strand technology. 


11.6.2 The Scheduler 


The load-balancing strategy is based on the manager-worker model [10,12,74]. 
The scheduler consists of a central manager plus a set of workers. Each worker 
repeatedly obtains a unit of work (or task) from the manager and executes this 
work to completion; the manager allocates tasks to workers as required. 

Our scheduler differs from that described in Section 8.4 in two respects. First, it 
allows workers to contribute to the task pool maintained by the manager. Second, 
it allows for data dependencies between tasks in a similar fashion to the scheduler 
presented in Section 5.3. A data dependency exists between two tasks A and B, if 
B requires data produced by A. As our scheduler requires that each worker execute 
only a single task at a time, deadlock can occur if data dependencies are not taken 
into account. For example, consider what happens in a system containing a single 
worker if task B is allocated to that worker before task A. 

The process structure used to implement the scheduler is illustrated in Fig- 
ure 11.2. It forms a star with a manager and filter at the center and workers 
at the spokes. Each worker has a stream to the manager that it may use to re- 
quest work. Each worker also has a stream to a filter that it may use to append 
contributions to the task pool. Finally, a stream links the filter and the manager. 


264 Chapter 11. Aligning Genetic Sequences 


par 
we wn 


Figure 11.2: Scheduler Process Structure 





Workers pass tasks to the filter in bundles. A bundle is a set of tasks, some 
of which may be designated as dependent on others. The function of the filter 
process is to delay each task until other tasks on which it is dependent have 
completed execution. Hence, the manager only receives tasks that can be executed 
immediately. 

As observed in Section 8.4.1, a manager-worker scheduler can naturally be 
executed on a ring virtual machine. The manager is created on the initial node; 
a worker is created on each successive node. Each worker is given a stream to the 
manager, thus creating the star topology. In outline: 


scheduler(...) :— 
manager(...), 
filter(...), 
workers(.. . )@fwd. 


workers(...) :— 
worker(...), 
workers(.. . )@fwd. 

workers(...). 


The complete scheduler program is given as Program 11.2. It exports a single 
process definition, scheduler, which has the form: 


scheduler(Count, WorkerMod,FirstTask) 


where Count is the number of workers to create, WorkerMod is the name of the 
module that contains the worker definition and FirstTask is an initial task to be 
placed in the task pool. The scheduler creates the filter and manager (R3) and 


11.6. Using Multiprocessors 265 


spawns N workers around a ring (R7,8). The manager simply matches requests 
for work (R) with tasks (W), until no more work is available (R4-6). 


The filter receives bundles of tasks from workers (R9). A bundle has the form: 
{Tasks, DependentTask, Done} 


The first component, Tasks, is a list of immediate tasks. These have no data 
dependencies and can hence be passed to the manager for immediate allocation. 
The second component, DependenttTask, is a single dependent task: This is not to 
be executed until all Tasks have been completely processed. The third component, 
Done, is a variable to be assigned a value when DependentTask, and hence Tasks, 
have completed. 

The filter process creates a forward process to pass the immediate tasks to 
the manager. A task is passed as a tuple of the form {Task,Done}, where Done 
is a variable to be assigned a value when the task is completed (R11,12). An 
await process is also created: This waits until the Done variables associated with 
immediate tasks are assigned values and then passes any dependent task to the 
manager (R13-15). 

This concludes the presentation of the scheduler. It is important to note 
that the scheduler, although developed for the alignment program, is completely 
application-independent. It can be used to execute any program that adheres to 
the scheduler’s protocols. These protocols can be summarized as follows. 


1. A worker is defined by a process definition worker with two arguments, cor- 
responding to a request stream and a work stream. 


2. A worker generates a stream of variables representing requests for work units 
and accepts replies in the form {Unit,Done} or halt. 


3. When a worker receives a work unit, {Unit,Done}, it either: 


(a) executes Unit to completion and then assigns the value done to the task’s 
Done variable; or 


(b) executes part of the task and then links the task’s Done variable with 
(one or more) bundles of new tasks, expressed in the form described 
previously. 


11.6.3 The Transformation 


A source-to-source transformation must be applied to the alignment program be- 
fore it can be executed with our scheduler. The transformation takes the original 
program and constructs a new program capable of both generating and process- 
ing tasks. An important feature of this transformation is that it can easily be 
performed automatically using the techniques described in Chapter 9. 

The essential aspects of the transformation are demonstrated using a simple 
example. Consider this outline program: 


266 Chapter 11. Aligning Genetic Sequences 


-exports([scheduler/3}). % R1 
-machine(ring). % R2 
scheduler(N,WMod,FirstTask) :— % R3 


filter(Wks,Ws1), merger(Ws1,Ws2), 
manager([{FirstTask,-}|Ws2],Rs), 
merger(Rs1,Rs), merger(Wks1,Wks), 
workers(N,WMod,Wks1,Rs1)@fwd. 


manager([(W|WkK].[R|Rs]) :— R := W, manager(Wk,Rs). % RA 
manager([],[R|Rs]) :— R := halt, manager([],Rs). % R5 
manager(-,[]). % R6 
workers(N,WMod,Ws,Rs) :— % R7 
N>0 | 

N1 is N—-1, 

Ws := [merge(W)|Ws1], Rs := [merge(R)|Rs1], 

WMod:worker(W,R), 

workers(N1,WMod,Ws1,Rs1)@fwd. 
workers(0,.,Ws,Rs) :— Ws :=[], Rs := []. % R8 
filter([work(Wk,Dep,D)|In],Ss) :— % R9 


Ss := [merge(S1),merge(S2)|Ss1], 
forward(Wk,Vs,S1), await(Vs,Dep,D,S2), 
filter(In,Ss1). 
filter([],Ss) :— Ss := []. % R10 


forward([P|Wk],Vs,Ss) :— % R11 
Vs := [Term|Vs1], Ss := [{P,Term}|Ss1], 
forward(Wk,Vs1,Ss1). 


forward([],Vs,Ss) :— Vs :=[], Ss := []. % R12 
await([done|Vs],1,D,Ss) :— await(Vs,T,D,Ss). % R13 
await([],T,D,Ss) :— T =\= true | Ss := [{T,D}]. % R14 
await([],true,D,Ss) :— Ss :=[], D := done. % R15 


Program 11.2: Manager-Worker Scheduler 


11.6. Using Multiprocessors 267 


p(...) :— 


a(... 
b(...). 
C(... 
d(... 


w eee eee” 
e . . 


Let us assume that the process p will be given to the scheduler as an initial 
task. We must decide which of the processes created by execution of p are to 
be dispatched for remote execution. Either programmer-supplied information or 
automatic analysis may be used to determine that after process a performs some 
initial computation, processes b and c are able to execute independently, and that 
when these processes terminate, d can execute. If b and c are judged sufficiently 
substantial, then this program is transformed to execute a locally and pass pro- 
cesses b, c and d to the scheduler. The process d is made dependent on b and c. 
The result of transforming the example program is outlined in Program 11.3. 


worker(Rs,Ws) :— % R1 
Rs := [R|Rs1], worker1(done,Rs1,Ws,R). 

worker1(done,Rs,Ws,{p(. . .),Done}) :— % R2 
Rs := [R|Rs1], 


p(. .., Done, N,Ws,Ws1), 
worker1(N,Rs1,Ws1,R). 


worker1(done,Rs,Ws,{d(. . . ),Done}) :— % R3 
Rs := [R|Rs1], 
d(...,Done), 
worker1(Done,Rs1,Ws,R). 

worker1(done,Rs,Ws,halt) :— Rs :=[], Ws := []. % RA 


p(...,D,N,Ws,Ws1) :— % R5 
a(...,done,N), 
p1(N,...,D,Ws,Ws1). 


p1(done,...,D,Ws,Ws1) :— % R6 
Ws := [{[b(...),c(... )],d(...),D}|Ws1]. 


Program 11.3: A Transformed Program 
The worker process starts by requesting a task (R1). It then repeatedly requests 


and processes tasks (R2,3) until told to halt (R4). A task is represented by a term 
of the form {Process,Done}. Program 11.3 shows the worker1 rules that process 


268 Chapter 11. Aligning Genetic Sequences 


requests to execute p and d tasks. For brevity, rules for b and c are not shown; 
they are similar to the d rule. 

The process p is transformed to a process that executes the process a (R5) and 
then passes a bundle of tasks, containing processes b, c and d, to the scheduler 
(R6). Recall that a bundle has the form {Tasks,DependentTask,Done}. 

The transformed p process is invoked by the worker1 rule that deals with p 
requests (R2). The variable Done associated with the p request is not assigned a 
value at this point, as the task has not been completed: Instead, this variable is 
passed to the scheduler with the bundle. The scheduler will ensure that the Done 
variable is assigned a value only after the dependent process d has completed (at 
which point the other processes must also have completed). 

The’ other processes, b, c and d, are simply augmented with a short-circuit 
and executed directly. The short-circuit is used to detect termination and assign 
a value to the associated Done variable. 

To illustrate the application of this transformation to the alignment program, 
we consider the following rule from Program 11.1: 


divide(Ss,BP,Algmt) :— % R5 
BP =\=[] | 
split(Ss,BP,L,R,UP), 
align-chunk(L,LA), 
align-chunk(R,RA), 
align.chunk(UP,UnPA), 
combine(LA,BP,RA,UnPA, Algmt). 


Assume that the three align-chunk processes are selected as likely-looking pieces of 
work. Transformation of this rule then yields the following rules: 


divide(Ss,BP,Algmt,Rs,Rs1,D,N) :— % RS’ 
BP =\=[] | 
split(Ss,BP,L,R,UP,N), 
divide1(N,L,R,UP,BP,Algmt,Rs,Rs1,D). 


divide1(done,L,R,UP,BP,Algmt,Rs,Rs1,D) :— 
Rs := [{[align-chunk(L,LA), 
align_chunk(R,RA), 
align-chunk(UP,UnPA)], 
combine(LA,BP,RA,UnPA,Algmt),D}|Rs1]. 


The original rule spawns processes to align the left, right and unpinned chunks 
and to combine the aligned sections. In contrast, the transformed rules pass 
these processes to the scheduler, hence making them available for execution by 
other workers. The combine process is made dependent on the three align-chunk 
processes. 

The transformation that we have developed also support what we call condi- 
tional dispatch. The divide-and-conquer strategy adopted by the alignment pro- 
gram generates a large number of tasks. These rapidly become too small to be 


11.6. Using Multiprocessors 269 


worthwhile distributing to other computers. We hence allow the programmer to 
specify a minimum size for task data. Tasks with data smaller than this minimum 
are executed locally. 

In summary, the alignment program is transformed to execute in conjunction 
with a manager-worker scheduler. We defined and implemented a source-to-source 
transformation that takes as input a source program and either obtains from the 
programmer or infers which processes are to be dispatched as tasks. The transfor- 
mation generates a new program that is capable of both generating and executing 
these tasks. Automation of the transformation process made it easy to experiment 
with alternative partitioning and scheduling strategies. 


11.6.4 Performance Studies 


Preliminary performance studies of the bilingual alignment program were con- 
ducted on an Encore Multimax. This machine consists of 20 National Semicon- 
ductor 32332 computers and 64 MBytes of shared memory, accessed using a high- 
speed bus. The Strand implementation employed for these experiments used the 
shared-memory simply to simulate message-passing. Hence, the results of these 
performance studies can also be expected to apply to message-passing machines. 
The studies involved a single, relatively small, test data set and two different 
transformations of the alignment program. 

The test data set used for this investigation incorporated annotations provided 
by a biologist. These annotations permit the initial align-chunk problem to be 
immediately divided into a number of independent sub-alignment problems. Our 
first attempt at transforming the alignment program for parallel execution only 
created a task for each such sub-alignment. Execution times on a varying number 
of computers are listed in Table 11.1. Except for N = 1, N computers were used 
to execute N — 1 workers and a single manager. 


Table 11.1: Parallel Execution 


No. of Time 
Computers | (secs) 





The transformed program was not found to execute appreciably slower than the 
original program on a single computer. This is not surprising, as only a few 
additional process reductions are required to create and distribute work. However, 
the transformed program shows a maximum of only 5.4 times speedup on 16 
processors. 

Monitoring of the program was able to explain this result. The largest task 
generated was found to take 478 seconds to execute. In addition, the initial task 


270 Chapter 11. Aligning Genetic Sequences 


which splits the data into subsequences and the terminating task which combines 
these subsequences take a total of 36 seconds. Hence, the minimum execution 
time possible is 478 + 36 = 514 seconds: not much less than the best time of 578 
seconds. These figures indicate that ramp-down (and to a lesser extent ramp-up) 
times are significant: Computers are idle while the big task and terminating tasks 
are executing. 

A second attempt at transformation sought to reduce task size by dispatching 
align_chunk and combine processes generated by the divide and conquer algorithm 
(Rule R5 in Program 11.1). This is essentially the transformation illustrated 
in Section 11.6.3. An initial experiment showed that dispatching all align-chunk 
tasks did not lead to performance gains. This was attributed to the creation of 
too many small tasks. Hence, we chose to only dispatch tasks for which the size of 
the input chunk exceeded a certain threshold. This threshold, expressed in terms 
of the number and size of the sequences comprising the input chunk, was made a 
parameter of the program so as to permit experimentation with different values. 
Results obtained for various threshold values on 11 computers are summarized in 
Table 11.2. S is the number of sequences to be aligned; L is the length of the first 
sequence. 

Table 11.2 shows that the number of tasks increases as the threshold is pro- 
gressively decreased. The execution time first reduces and then increases as tasks 
become too small. 


Table 11.2: Effect of Task Size on Run-time: 11 Computers. 


No. of Mean (secs) Total Time 
PLE | rest | Time es) | Gee) 
ee ae a 
s [50 | 269 [10.7 | 304 


Using 16 computers, a best time of 350 seconds was achieved with S=5, L=50. 
This represents a speedup of 8.9: considerably better than that achieved using the 
first transformation. We expect to achieve even better results on larger problems 
and using alternative transformations. 

In summary, the manager-worker scheduler allowed us to achieve a significant 
reduction in execution time for a highly irregular problem. Two concepts helped 
us to achieve good performance: The recognition of data dependencies between 
tasks and the use of run-time tests on the size of input data to determine whether 
to dispatch tasks for remote execution. 

It would be interesting to compare the performance of the bilingual program 
and an equivalent program written entirely in C. However, the Strand rewrite of 
the top levels of the original C program led us to improve the algorithm used. This 
reduced overall run-time and hence prevented comparison of the two programs. 

















11.7. Summary 271 


11.7 Summary 


It is our belief that computational problems from molecular biology may repre- 
sent one of the most significant uses of computers during the coming decade. The 
problems posed in this area are frequently computation-intensive and parallel com- 
putation may well prove to be required. Hence, we explored the potential use of 
Strand as a vehicle for achieving both ease of programming and effective use of 
multiprocessing capabilities. By coding critical kernels in C, we achieved good 
performance. 

The programming experiment reported in this chapter involved the coding of 
a novel sequence-alignment algorithm in Strand and C. The bilingual program was 
executed on a multiprocessor with encouraging results. A key feature of our ap- 
proach was the use of a manager-worker scheduler to handle an irregular problem. 
Another was the use of a source-to-source transformation of an original program 
to partition a problem at compile-time. Our experience developing an effective 
parallel implementation of the alignment program emphasized the importance of 
a software environment that encourages exploratory programming. 

Our experience with Strand has been gratifying. It has been possible to write 
bilingual programs that are clear, that perform well, and that can be executed on a 
range of available multiprocessing environments. We intend to continue our work 
by implementing a distributed database to support searches for genetic sequences 
that display similarities to a given sequence. Strand offers an attractive tool for 
implementing just such a system. 


AN i 3 
le 
lo 





vl 
a 


\ 
h 
NINN 
| Nb c 


11) N 
NN 
NÀ 
WN 
N \ 
Gi 





n 
\ 
\ 


N 
NY 
a \ 


NN 
\ 


` 
r p- 4 
a 


ate OS 
IPE 


ae 
we 


Chapter 12 


Discrete Event Simulation 


Martin Gittins 


Strand Software Technologies, 
a division of Artificial Intelligence Ltd 
Watford, England 


12.1 Introduction 


Artificial Intelligence Ltd is the commercial supplier of an interactive simulation 
product called Stem. This product brings the high performance graphics found 
on Lisp workstations to the simulation world, allowing interactive development of 
simulations. Stem is designed to provide a set of features that can easily be ex- 
tended to support special simulation operations. Typical applications of Stem are 
modeling of network protocols for automobiles, simulation of ultrasonic reflections 
and job shop modeling. As the size and accuracy of such simulations increases, 
processing requirements are rapidly exceeding the capabilities of current worksta- 
tions. Our work on building a simulation system in Strand represents an initial 
investigation into extending Stem to operate on parallel computers. 

Simulation is concerned with representing some facet of a physical system in a 
computer program, such that data can be derived that cannot be obtained from 
the physical system for reasons of cost or availability. Normally the facet concerned 
relates to the timing of the system and the data desired relates to the performance 
of the system under particular loads. One might, for example, be interested in the 
performance under greater stress than could be physically applied to the system, 
or one might wish to explore a set of alternative designs at less cost than creating 
the real system. 

Typical systems range from simple retail outlets through communication sys- 
tems to large systems such as complete factory shop floors. It is important to 
differentiate simulation from modeling or emulation. Simulation does not attempt 


2735 


274 Chapter 12. Discrete Event Simulation 


to replicate the behavior of the system as an emulation does, and unlike a model, 
the only way results can be obtained is by measurement of the simulation com- 
putation. In fact, real physical systems are impossible to simulate with complete 
accuracy; instead the physical system is mapped to some idealized model system. 
This model system is a manageable subset and simplification of the real system 
that can be simulated. However, the limitations of the physical-to-model mapping 
usually define the limits of the applicability of the simulation study. 

Discrete event simulation is one of many approaches to building simulation 
systems. In this approach, interactions in the model system are assumed to happen 
only at instantaneous points of time, referred to as events. In the time intervals 
between events no interactions are assumed to occur. Thus, these intervals can 
be ignored, and the only computation required relates to the events themselves. 
The intervals may differ significantly between different simulations or different 
intervals. For example, a simulation system for nuclear waste handling may have 
events relating to arrival of material separated by minutes and events relating to 
material reaching safe limits separated by decades or longer. All events other than 
some set of initial events are regarded as being the consequence of previous events. 
The simulation is run until some prescribed time limit is reached or all events have 
occurred. 

Discrete event simulation systems conventionally utilize an event queue to 
organize events. This is a queue of events ordered according to the time at which 
they are to occur. A top-level simulation loop then removes the event at the front 
of the event queue and computes the consequence of that event. This results in an 
updated state for the system, but may also create some new events. Any newly 
created events are inserted into the event queue at the correct locations, and the 
next event is taken from the front of the queue and processed in turn. As each 
event is removed from the queue the simulation clock is updated to the time of 
that event [55]. 

As an example, consider a simple bank simulation. The event queue for the 
first three iterations of a typical simulation is shown in Figure 12.1. The system 
starts with a single event: the bank opening at 9:30. This event is removed and 
processed, causing two new events to be added: the arrival of the first customer 
and the bank closing at 15:30 (this is a British bank). The arrival of the first 
customer gives rise to two more events: the first customer reaching the teller 
and the arrival of the second customer. It is common practice for the arrival or 
generation of some entity to schedule the arrival of the next. 

This event queue mechanism is quite easily programmed and maps directly 
into a good algorithm for a single computer. However, it is a sequential algorithm. 
Each event is processed in turn and processing of the next event cannot begin until 
the previous event is complete. This is because the system does not know whether 
a new event might be created which should occur before the next event in the 
queue. One can conceive of various ways in which the event queue could be split 
into a number of smaller queues and each queue managed independently. This 
might be a reasonable approach if parts of the simulation have little interaction 
with the rest. However, as long as some interaction exists, it is necessary to 


12.1. Introduction 275 


9:30:00 9:31:15 9:31:45 
9:30:00 Bank Opens & 9:31:15 Cust 1 arrives 
15:30:00 Bank Closes 


9:31:45 Cust 1 at teller 









9:33:45 Cust 2 arrives 


15:30:00 Bank Closes 


Figure 12.1: Event queue during first three iterations of bank simulation. 


synchronize the time of the various parts. This is hard to achieve without using 
some mechanism that is as sequential as the event queue. 

A number of concurrent algorithms for simulation have been proposed [54,41]. 
Though perhaps not the most effective, the time warp algorithm [41] is certainly 
one of the more interesting. In essence, this seeks to resolve synchronization 
problems by ignoring them and fixing things up afterwards. The effect of ignoring 
synchronization is that a logical process may receive a message which relates to a 
time in its past. This situation can be accommodated if we allow the time of each 
logical process to move backwards (roll back) as well as stepping forwards. The 
ability to return from the present time, tp, to some earlier time, t (kK<p), requires 
that the state of the logical process at that time, Sẹ, is still known. In fact, all 
previous states, So...S, must be known for all logical processes. This obviously 
takes a lot of space. In addition, the states S,,1 to Sp must be recalculated when 
roll-back occurs to allow for any changes, and any interaction with other logical 
processes must be redone. 

At first sight this may seem too high a price to pay for resolving the synchro- 
nization problem. One observation should be made immediately: if each logical 
process is executing on its own physical computer, then the burden of roll-back is 
carried by the computer which is furthest in the future and no burden is carried by 
the slowest computer. Thus the overhead for lack of synchronization is correctly 
located. In addition there is considerable scope for optimization. 

First, a message cannot be generated that is earlier than that of the logical 
process with the slowest clock. Therefore all states relating to times earlier than 
this global past can be forgotten. In addition, states often do not alter greatly 
between time steps. In consequence, state information can be shared across states 
for different times, possibly at the expense of some time overhead. 

A second class of optimization concerns the mechanism used to undo the results 
of sending messages that may now be invalid. Such messages can be canceled by 
sending an anti-message. If at time t,4; a message M,41 was sent, then when 
the logical process rolls back it sends the corresponding anti-message, a,41. Three 
things can happen to this anti-message. When it arrives at the destination its 
matching message may have been processed, in which case it will initiate roll- 


276 Chapter 12. Discrete Event Simulation 


back. Its matching message may still be queued, in which case the message and 
anti-message pair can mutually annihilate each other with no further processing. 
The last possibility is that it may overtake its message en-route. In this event 
mutual annihilation will occur when the two meet. The mutual annihilation avoids 
redundant processing. 

Third, it is observed that when roll-back occurs the messages that are sent on 
roll-forward are often the same as those sent previously. Therefore, if the sending 
of anti-messages is delayed until roll-forward, it is possible to compare the messages 
originally sent with those now being sent and only send the differences. 

The purpose of this case study is to examine how some of these issues can be 
tackled in Strand. To this end two forms of a simulation system were constructed, a 
sequential version with a direct implementation of an event queue, and a time warp 
version utilizing the optimizations discussed. In order to evaluate the practicality 
of building such a simulation system we needed a physical system to model. A 
simple model of a switching circuit, such as might be found in a public switched 
telephone network (PSTN) was chosen and is described in the next section. 


12.2 Problem Domain 


A switching system was chosen for a number of reasons. The primary reason was 
that the problem consists of a small number of physical entities, namely: phones, 
lines and switches. This means that the quantity of domain code that has to 
be written is limited. In addition, it is possible to create a switching system of 
any size by increasing the number of components. A common observation when 
dealing with multiprocessor systems is that large problems are required for effective 
computation. 

Any real switching system has a complex topology and generally has switching 
capacity for many more circuits than actually exist. In consequence, not every 
number is valid. In order to avoid the complexity of dealing with non-existent 
circuits the model used has a regular structure with all circuits in service. 

Two types of logical process exist in the simulation: phones and switches. The 
switches form a tree with a root switch at the top. This switch is numbered {0}. 
The switches in the layer below are numbered {0,0} to {0,9}, such that switch 
{0,k} is connected to the kth line of the root switch. Switches in the third layer 
are numbered {0,0,0} to {0,9,9} in an analogous manner. Phones have a single 
line into a switch and form the leaves of the tree. They are numbered in the 
same manner as the switches: for example, if there are four layers of switches, 
the phones are numbered {0,0,0,0,0} to {0,9,9,9,9}. The total number of phones 
is 10”, where n is the number of switch layers. The structure of the switches is 
shown in Figure 12.2. 

A switch is defined to be a device which has 11 cables attached to it. Ten of 
these cables, the input lines, go to subordinate devices (switches or phones), and 
the 11th forms a parent cable to a higher level switch. The required tree structure 


12.2. Problem Domain 277? 


Figure 12.2: The PSTN Model 


is therefore created by connecting the parent lines of 10 switches into the input 
lines of a higher level switch. The root switch has a null parent cable. Each cable 
may consist of a number of logical lines, one logical line being required for each 
connection. The number of logical lines can be varied, and in principle can be 
different for different switch layers or even for different switches in the same layer. 

A phone is a primitive device with one cable into it. It is only capable of 
handling one logical line (i.e. one connection). Our model also includes a customer 
within the logical phone that periodically makes calls to random numbers and, if 
successful in getting a reply, talks for a while to the other phone. If a call attempt 
is unsuccessful, or when a call is completed, the phone waits for a period and then 
attempts another call. We assume that the caller always hangs up the connection 
rather than the callee. Phones will therefore perform the following actions at 
appropriate times: 


make-call, hang-up and connect. 
They also understand responses from a switch of the form: 
fail (engaged) and ack (connected). 


A phone will therefore expect to deal with the following events, or messages, 
and will take the appropriate actions: 


278 Chapter 12. Discrete Event Simulation 


start 
1. Enqueue a make-call event for a random time later. 
make-call (from self) 


1. If state is free, generate a number to call and issue a connect 
request to local switch, and then wait for a fail or ack. If state is 
not free, then enqueue a make-call event and exit. 


2. Set state to calling. 
hang-up (from self) 
1. Send a disconnect request to the local switch. 
2. Set state to free. 
3. Enqueue a make-call event for later. 
connect (from switch) 


1. If state is free then send ack event to the switch; otherwise send 
fail. 


2. Set state to engaged. 
disconnect (from switch) 

1. Set state to free. 
fail (from switch) 

1. Set state to free. 

2. Enqueue a make-call event for later. 
ack (from switch) 

1. Set state to engaged. 


2. Enqueue a hang-up event for later. 
Switches only perform actions when requested. They respond to requests to: 


connect a call 


1. If line is available, then send connect request to the next device; 
otherwise send fail to calling device and exit. 


2. Set line state to be engaged for incoming and outgoing lines. 


3. Update connection table to show new connection. 
forward an ack 

1. Forward ack message to connected device. 
disconnect 


1. Send disconnect request to connected device. 


2. Set line state to be free for incoming and outgoing lines. 


12.3. Representation Issues 279 


3. Remove entry from connection table. 
fail 


1. Send fail message to connected device. 


2. Remove entry from connection table. 


12.3 Representation Issues 


In this section we discuss several issues concerning implementation in Strand, and 
in particular those issues relating to the problem domain. These issues are of course 
a feature of any simulation implementation. The requirements for synchronization 
and control, in the form of event queue and time warp management, are discussed 
in the following sections. 


12.3.1 Logical Process Representation 


Our simulation domain is naturally formulated in terms of logical processes which 
have state and respond to messages. The most direct way of representing a logical 
process in Strand is as a perpetual process. As part of its state information each 
perpetual process knows about the other processes with which it can communicate. 
In this particular domain the network of logical processes is static and the set 
of “acquaintances” can hence be fixed at the time a process is created. Other 
problem domains may require that the set of acquaintances be updated during 
the simulation. 

The original version of the simulation code implemented each logical process 
as a single perpetual process. Program 12.1 shows a phone implemented in this 
manner. InState holds the phone’s local state. Note that the phone has no ac- 
quaintances; it only has an input stream. 


phone([{Msg, Time,Events}|Msgs],InState) :— 
do_phone(Msg, Time, |InState,OutState, Events), 
phone(Msgs, OutState). 


Program 12.1: A Phone Object 


However, such a direct implementation is not suitable for a time warp system. 
The phone process maintains state as a local argument. But by its very nature a 
time warp process needs to maintain many states. State maintenance functions are 
the same for all devices and domains and should therefore be handled by generic 
simulation code. We thus remove state information from the phone process and 
incorporate it in each message as shown in Program 12.2. 


280 Chapter 12. Discrete Event Simulation 


phone([{{Msg, Time, InState,OutState, Events }|Msgs]) :— 
do_phone(Msg, Time, InState,OutState, Events), 
phone(Msgs). 


Program 12.2: A Phone Object with External State Control 


12.3.2 Domain Isolation 


Another major issue is isolation of the domain code from the simulation control 
code. This is desirable in this case study because it allows the same domain code 
to be used in both simulations. More generally, of course, it allows the same 
simulation systems to drive a variety of domains. 

The simulation is driven by a generic simulation controller that maintains 
the simulation clock, or clocks, determines the next event to be processed, detects 
termination and collects statistics concerning the simulation run. In the sequential 
event queue case the controller manages the event queue and calls the domain code 
for each event. In the time warp case the controller must operate in a distributed 
fashion and manage multiple states per logical process. 

The isolation of domain and simulation code is achieved by providing each 
domain process with a stream to the simulation system. Each process registers 
with the simulation controller, at which point the controller returns an address to 
be used by other processes to refer to this new process in subsequent messages. 
The relationship between domain and simulation code is shown in Figure 12.3. 








Simulation Code Domain Code 


add object 
Spawning 


Running 


Figure 12.3: Relationship of Domain and Simulation Code 


The simulation controller is implemented as a set of router processes. One 
command that the routers understand is a request to register a process. The top- 
level code that implements this is shown is Program 12.3. When a Router process 
receives an incomplete message add-_object(Obj,Addr) it spawns an add-_object process 
to update its tables and returns to the domain code the address (Addr) by which 
that code should refer to the object. 


12.3. Representation Issues 281 


router([add_object(Obj, Addr, )|In],{A,Myld,R,OCount,Chunk},Routers,Objs) :— 
Index is OCount + 1, 
Addr := {Myld, Index}, 
add_object(Obj, Index, Objs, NewObjs), 
router(In, {A,Myld,R,Index,Chunk},Routers, NewObjs). 


Program 12.3: Adding an Object 


12.3.3 Detecting Completion 


It is a requirement of the event queue model that each event be dispatched for 
processing only when the previous event has completed. A mechanism for detect- 
ing termination of event processing is required. In general, we detect. termination 
of a set of processes using a short-circuit. However, in this case a simpler solu- 
tion is possible. We provide a separate process, responsible for constructing the 
new state, that waits until all components of the new state are available before 
proceeding. As the next event cannot proceed until this new state is constructed, 
events are sequenced correctly. 

Program 12.4 shows a fragment of the code for disconnecting a switch. Note 
the use of the bind_switch process. This waits until all new state components and 
the new Events are available before constructing OutState. 


switch([{Msg,Time,|InState,OutState,Events}|Msgs]) :— 
do-switch(Msg, Time,InState,OutState, Events), 
switch(Msgs). 


do.switch(disconnect(Who,Lineno), Time, {Self,Stat,Lines,CT},OutSt,Es) :— 
cable_from_addr(Who,Stat,. . .), 
update_CT_entry(CT,. ..,NewCT), 
update_lines(Lines,. . .,NewLines), 
forward-_disconnect(.. .), 
bind_switch(Es, Self, Stat, NewLines, NewCT, OutSt). 


bind_switch(Events, Self,Stat,Lines,CT,OutState) :— 
data(Events), data(Self), data(Stat), data(Lines), data(CT) | 
OutState := {Self,Stat,Lines,CT}. 


Program 12.4: Switch Code, Showing Use of bind_switch 


282 Chapter 12. Discrete Event Simulation 


12.3.4 Spawning the Network 


The network of switches and phones needs to be created so that each device 
knows the address of the devices it is connected to. In addition, when more 
than one computer is involved, the devices need to be allocated to particular 
computers. Since the traffic pattern is reasonably static, dynamic load balancing 
is unnecessary. Hence, we employ a simple static distribution scheme. 

The network is spawned as a hierarchy of processes. Each switch-creating call 
initiates all the switches below it in the tree as shown in Figure 12.4. Information 
is passed down the tree in a regular manner. A partial implementation of switch 
spawning is shown in Program 12.5. The create_switch process creates spawn-switch 
to spawn the next layer of switches (R1). If size is 10, then it is time to spawn 
phones, and a different create_switch rule (not shown) is selected. Recursive calls 
to the spawn-_switch process create a total of 10 sub-switches (R2,3). 


quota = 12 


{0,5,0} {0,5,1} {0,5,2} 
quota = 12 quota = 2 quota = 60 
node = 1 node = 1 node = 2 


Figure 12.4: The Spawning Hierarchy 


The allocation of devices to computers should maintain locality in communi- 
cation: devices that are directly connected should be on the same node if possible. 
This is achieved using a quota system. Before spawning, the number of phones 
per node is calculated; for example, if a four layer system with 10,000 phones is to 
execute on a 64-node computer, then 156 phones must be located per node. This 
number is passed as an initial quota to the top-level switch. The spawning pro- 
gram is executed on a ring virtual machine. As each subordinate switch is created, 
an amount equal to the number of phones controlled by the subordinate switch 
is subtracted. If the quota becomes negative, then the initial loading is added, 
but the spawning continues on the next computer in the ring. At each step, the 


12.4. A Sequential Event Queue Solution 283 


create_switch(Router,Parent,Num,Size,Level,Tb,Te) :— % R1 
Size > 10 | 
Router := [merge(R1)|R2], 
do.create(R1,Parent,Num,Level,Self), 
NextSize is Size/10, 
NextLevel is Level + 1, 
bump_number(Num,Nn), 
spawn-switch(0,R2,Self,Nn,NextSize, NextLevel, Tb, Te). 


spawn-_switch(Cnt,R,P,N,Size,L,Tb,Te) :— % R2 
Cnt < 10 | 
Cnt1 is Cnt +1, R := [merge(R1)|R2], 
fix_number(N,Cnt,Me), 
create_switch(R1,P,Me,Size,L,Tb,Tm), 
spawn_switch(Cnt1,R2,P,N,Size,L, Tm, Te). 
spawn_switch(10,R,_,_,-,.,1b, le) :— Tb := Te, R :=[]. % R3 


Program 12.5: Implementation of Switch Spawning 


remaining quota is passed down to the newly created switch to be used in the 
next round of allocation. The advantage of this mechanism is that it performs 
reasonably well over all reasonable ranges of numbers of computers and phones. 

The simulation must not begin until spawning is complete. The short-circuit 
technique is used to detect termination of spawning. The short-circuit is threaded 
through all of the spawn-_switch processes using the parameters Tb and Te. The 
circuit is closed in spawn-_phone, so the top-level short-circuit is closed when all 
phones have been created. 


12.4 A Sequential Event Queue Solution 


Recall that an event queue simulation system consists of an event queue, which 
maintains a set of events in time order, and a controller, which repeatedly removes 
the next event from the front of the queue and passes it to the domain code for 
execution. This in turn generates new events to be inserted into the queue. 

The components required to form an event queue solution are shown in Fig- 
ure 12.5. The various entities are linked by streams; the messages that pass 
between them are indicated. Note that for the reasons outlined earlier each log- 
ical process is represented as two processes, a simulation process and a stateless 
domain process. The structure of the simulation process is very simple and is not 
discussed here. The domain process has already been discussed, which leaves only 
the event queue and the controller to be discussed in this section. 


284 Chapter 12. Discrete Event Simulation 


insert(Time,{Dest, Data}) 
dequeue(Time,{Dest,Data}) 


Controller 


{Time, Data} 




















Events 


Simulation Process 


{Data, Time, InState, OutState, Events} 


Domain Process 


Figure 12.5: Messages Passed Between Objects 





An event queue can be represented in a number of ways in Strand. A partic- 
ularly elegant representation treats the queued events as processes. Each process 
in the queue waits for messages from the process in front. When nothing is hap- 
pening, the entire queue is suspended, each process waiting for a message from 
its neighbor. When an insert request arrives, it is passed as a message to the 
head of the queue. This process checks to see whether the message should be 
inserted before or after itself. If it should be inserted before, then a new process is 
spawned to represent the new event. This new process waits on the input stream 
and the old process waits on a new stream from the newly created process. If the 
message cannot be inserted before, then it is passed to the next process which 
performs the same test. An implementation of this insert operation is presented 
in Program 12.6. 


queue([insert( Time, State)|More],MyTime,MyState,Next) :— 
Time < MyTime | 
queue(More, Time, State, Link), 
queue(Link,MyTime,MyState,Next). 
queue([insert(Time, State)|More],MyTime,MyState,Next) :— 
Time >= MyTime | 
Next := [insert(Time, State) |Next1], 
queue(More,MyTime,MyState,Next1). 


Program 12.6: Inserting Into an Event Queue 


12.5. A Concurrent Time Warp Solution 285 


The implementation of the dequeue operation is even simpler: the time and 
state recorded in the first process are output and its input stream is passed to the 
next entry. 

The top loop of the event queue simulation is shown in Program 12.7. Events 
are repeatedly requested from the queue (R2) and processed (R3) until the queue 
signals that it is empty (R4). Note the use of a Done flag to synchronize the 
sending of events: It is initially set (R1) and is included in each event (R3). This 
flag is set by the simulation process once the domain process has assigned its 
output state. This is used to ensure that each event is processed fully before the 
next is dequeued (R2). The variable P is assigned the value done upon completion 
(R4). 


run-sim(Eventq,P) :— run-sim1(Eventq,P,done). % R1 


run-sim1(Eventq,P,done) :— % R2 
Eventq := [dequeue(T,State)|Eventq1], 
process-event(State,T,Eventq1,P). 


process-event({ Data, Dest}, T,EventQ,P) :— % R3 
Dest := [{Data,T,Done }], 
run.sim1(Eventg,P,Done). 

process-event((],_.,Eventa,P) :— % RA 
close(Eventq), P := done. 


Program 12.7: Top-Level of the Controller Loop 


12.5 A Concurrent Time Warp Solution 


Recall from the introduction that time warp is a simulation technique that allows 
the time for each process in a simulation to vary independently. This can result 
in processes receiving messages that refer to their past. At this point the process 
must roll back to a time before that message arrived and reprocess. In the course of 
rolling back, messages that it sent to other processes must be undone because they 
are no longer correct. This is achieved by sending anti-messages. An anti-message 
and message form a matching pair and whenever they meet mutual annihilation 
occurs. If one compares Figure 12.6 (an overview of the time warp system) with 
Figure 12.5, one can see that the simple simulation process has been replaced with 
a time warp process. The single controller has been replaced by a set of routers 
and each router, perhaps surprisingly, still uses an event queue. 

The Routers and Global Past Time. Routers pass messages around the 
simulation system. Message passing could be handled directly using basic Strand 
mechanisms but the routers provide a number of additional functions. They opti- 


286 Chapter 12. Discrete Event Simulation 


insert(Time,{Type, Dest, Data}) 
dequeue(Time,{Type, Dest, Data}) 










msg(Time, Data) 
anti(Time,Data) 


msg(Time, Data) 
anti( Time,Data) 


Timewarp Object 


{Data, Time, |nState, OutState, Events} 







Domain Object 





Figure 12.6: Time Warp Structure 


mize throughput by maintaining time order on each computer. In addition, they 
provide a mechanism for handling global time and a means to collect statistics. For 
reasons of space, discussion of statistics collection and the router communication 
structure is omitted. 

The maintenance of time order merely means that the routers deliver local 
messages in approximate time order, like an event queue. This optimization avoids 
redundant computation. The event queue described earlier is put to good use 
performing this task. 

Three clock values are relevant to a discussion of time warp simulation. Global 
past time (gpt) is the maximum time known to be past on all computers. Local 
past time (lpt) is the time known to be past on each local computer. Now is the 
current time on each computer. 

A knowledge of global past time permits us to discard old states. Global past 
time can be determined as follows. One router is a time master and the others 
are slaves. The time master and slaves circulate a time message continuously. 
This contains two fields: counter and time. The time represents a proposed value 
for gpt. The counter is used to control the effect of the message. At the end of 
each circuit the master decrements the counter by one, unless it is already zero, in 
which case it initiates a new message with the counter set to its maximum value. 

A counter value of zero tells a computer that this is a discard message, that the 
time field is a valid gpt, and that all states prior to this time can be discarded. The 
computer then resets its value of lpt to now. A non-zero counter value requests 
a slave to update the bid with its lpt if this is earlier. Between discards each 
computer compares the time of every message with its lpt and updates the latter 
if the message refers to an earlier time. In consequence, whenever a bid is processed 


12.6. Discussion 287 


lpt is always the time of the oldest message seen since the last discard round. 

Time Warp Queue. The time warp process is an extension of the event queue 
concept, but now items are not removed from the queue to be processed. They 
remain instead in the queue until discarded by the advance of global past. This 
means that items can be in one of three states: future, present (called now) or past. 
For items that are in the past, life is quite simple; they await instructions from 
the earlier part of the queue. The only instructions possible are to be discarded or 
to insert a new entry after them (possibly an anti-message). In the event of such 
an insertion they spontaneously become future again. Items in the future have an 
extra complexity: they are waiting to process as well as awaiting instructions. All 
future items are waiting for their input state to be assigned by the previous item, 
and once assigned they can process, generate output and become past. The event 
actually being processed is in a now state to avoid ambiguity as to whether it is 
past or future. The net result is that a new instruction does not pass now until its 
computation has completed. 


12.6 Discussion 


We implemented the simulation systems described in this chapter and ran them 
successfully on an Intel iPSC/2 hypercube multiprocessor. We discovered some 
interesting issues when debugging the time warp queue mechanism. Perhaps the 
most important was dealing with messages that happen at the same time. We 
did not attempt to make the system give precisely the same results on different 
configurations. A system of 100 phones was used for most of our testing. The 
inter-switch cables generally had two lines and so the capacity of the simulated 
system was much less than the demand. 

The entire point of performing a simulation is to collect data. However, we 
have not presented any techniques for performing this task. An easy mechanism 
is to use the circulating time message to gather statistics and display them at the 
end of each circuit. 

Two features of the Strand language make it particularly well-suited to im- 
plementing the complex state maintenance functions required by time warp. Its 
automatic memory allocation and garbage collection free the programmer from 
the burden of memory management. Languages such as Lisp also provide these 
facilities. Secondly, the single-assignment nature of Strand means that states from 
different times may share data and are guaranteed never to interfere with each 
other. This is not true for Lisp and can represent a difficult programming task. 

In conclusion, the time warp scheme is not the only distributed simulation 
scheme currently under consideration. We also plan to implement and evaluate 
the Chandy-Misra-Bryant algorithm [54]; this appears more suitable for large scale 
multiprocessing. 


2 i 
A ! = aR Eor ne oe `] f oc. 
OQ Oa b y T | | 7 
| ie y UN ae” IM 
, ( Í 7 \ Sig ill Dn 
4 ` h H C 


i| aa 





Chapter 13 


Programming Telephony 


J.L. Armstrong and S.R. Virding 


Ellemtel Utvecklings AB, 
a company jointly owned by Ericsson and the Swedish PTT. 
S-125 25 Avsjo, Sweden. 


This case study is concerned with the use of Strand as a base for a system to 
program telephony applications. A telephone exchange viewed as a programming 
application presents many interesting problems that are, in many cases, quite 
different from “normal” applications. For example: 


Intrinsically parallel problem. Large exchanges can have as many 
as 200,000 subscribers and must be able to handle many calls simulta- 
neously. 


Severe real-time constraints. Communication protocols and sys- 
tem specifications both impose many real-time constraints on a system. 


Robust architecture. The system must be able to survive and re- 
cover from both hardware and software errors. 


Large code volume. Current systems typically have millions of lines 
of code. 


Communication dominant problem. Communication is a very 
important part of the problem, possibly even more important than 
processor or memory bandwidth. 


A telephone exchange can be viewed as a switch that interconnects differ- 
ent interface hardware, all controlled by a processing unit. Figure 13.1 shows a 
schematic view of an exchange. 


289 


290 





Chapter 13. Programming Telephony 





Line 


Interface Switch 


Unit 


Tone 
Receiver 
Unit 





Figure 13.1: Telephone Exchange. 


The switch is used both to set up speech and data channels between hardware 
units and to pass control signals between the processing unit and the hardware. 
The processing unit is, of course, concerned with the control logic of the exchange. 
The hardware connected to the switch can include: 


Line Interface Units. Used to interface subscriber telephones. 
Tone Sender Units. Used to send tones to subscriber headsets. 


Tone Receiver Units. Used to receive push-button tones from tele- 
phones. 


Multi-Party Units. Used to connect several subscribers together 
when conferencing. 


Trunk Line Units. Used to connect trunk lines (i.e., lines to other 
exchanges). 


The system is event-driven by signals from the connected hardware. Input signals 
usually result in computing some form of state change, outputting signals to set 
the hardware to reflect the new state, and then waiting for more input signals. 


13.1 History 


In this section, we summarize some results of a series of experiments performed 
at The Ericsson Computer Science Laboratory in the period 1983-1988. The aim 
of these experiments was to understand and improve the way in which telephony 


13.1. History 291 


application programs are written. By telephony application programs we mean 
the programs used to directly control a modern digital telephone exchange. 

The first attempts to understand telephony programming were performed us- 
ing a modified Ericsson MD110 [28] private automatic branch exchange (PABX) 
connected to a VAX 11/750. This system was used to program POTS (Plain 
Ordinary Telephony Service) in a number of conventional (imperative) languages 
(Ada, Concurrent Euclid), declarative languages (CCS, LPL, Prolog [22]), object- 
oriented languages (CLU, Smalltalk [40]) and a rule-based expert system (OPS4). 
The results of these experiments indicated that logic programming languages of- 
fered significant benefits, in terms of code volume and clarity, over conventional 
languages for programming telephony applications [23]. The main improvement 
seemed to be due to their rule-based syntax. This, together with pattern-directed 
procedure invocation and powerful data structures, leads to compact and easily 
understandable code. 

At that time (1986), we were convinced as to the suitability of logic languages 
for describing telephony but had encountered several interesting problems con- 
cerned with the efficient execution of these descriptions. One of the most pressing 
problems concerned the emulation of concurrency. In our early experiments, we 
emulated concurrency in Prolog. Benchmarks showed that our system spent more 
than 90% of its time supporting concurrency and only 10% of the time solving the 
problem! Prolog’s backtracking also gave rise to several severe problems. When 
controlling hardware, we want the actions performed by the hardware to be strictly 
deterministic. Having instructed the hardware to perform a certain action, we do 
not want, at a later stage, to have to instruct the hardware to “undo” the effect 
of a previous command. 

Thus, our requirements led us to consider concurrent logic languages since 
their inherent concurrency and deterministic nature were natural in our problem 
domain. In 1986, there were no good, commercially available, concurrent logic 
programming systems. A Parlog [34] system was thus developed by the second 
author. This system was used for further experiments in telephony programming. 

Early experiences indicated that Parlog was not a suitable application language 
for telephony programming. This was mainly due to the large number of extra 
arguments that had to be added to the rules. These additional arguments are 
required to implement the interprocess communication mechanisms required for 
telephony programming. 

We tackled these problems by developing a telephony-specific programming 
notation that could be compiled to a concurrent logic language. The notation 
was introduced by the first author in 1986; subsequent work led to the creation 
of the programming language Erlang, which is described in the remainder of this 
chapter. (Mathematician Agner Krarup Erlang (1878-1929) developed a theory of 
stochastic processes in statistical equilibrium that is widely used in the telecom- 
munications industry. ) 

In order to make a telephony programming language, we needed to add domain- 
specific abstractions on top of Parlog to reflect the specific problems encountered 
in telephony programming. It was, however, a relatively simple matter to compile 


292 Chapter 13. Programming Telephony 


our language into Parlog, since the basic Parlog mechanisms could be easily used 
to create the domain-specific mechanisms that were required. 

Initial performance studies showed that the speed of a Parlog-based system 
would probably be insufficient. Hence, we decided to attempt to base an Erlang 
implementation on the simpler and more efficient language Strand. The languages 
were similar enough to make it possible to use most of the experience gained from 
the Parlog version. We also found that many of the more complex features existing 
in Parlog were never used when compiling Erlang code, and only seldom used when 
implementing the Erlang kernel. In the latter cases, it was always easy to program 
around them. Hence, we have taken the approach of compiling our language into 
Strand and running the compiled version with a small run-time kernel written in 
Strand. 

Erlang has been in experimental use for over a year by a number of users 
outside the Computer Science Laboratory. Feedback from the user group has 
been extremely positive and the language has already been improved as a result 
of interaction with the user group. 

We regard the fact that we have successfully transferred a language from a 
research laboratory to a commercial environment as a major success. None of 
the programmers who are now happily using Erlang have had experience with 
any form of declarative language. Most of the programmers had experience with 
assemblers and Pascal-like languages. We feel that Erlang has been accepted by 
the user group because it specifically addresses some of the most common problems 
in telephony programming and has a syntax that is easy to learn and read. 


13.2 Introduction To Erlang 


This section introduces the programming language Erlang. We discuss the moti- 
vation for Erlang and illustrate the use of the language. 


13.2.1 Motivation for Erlang 


A language for programming telephony should satisfy the following requirements. 

Lightweight Processes. Telephony applications typically consist of control 
programs that handle thousands of simultaneous transactions. A local telephone 
exchange can have 100,000 connected lines; of these, 10,000 may be in use at any 
one time. Each line could be controlled by five to ten processes. The system 
overhead in creating or destroying one of these process should be very small. 

When we use the word lightweight to describe these processes, it should be 
remembered that Erlang processes are lightweight when compared with processes 
in many traditional programming languages. They are, however, more complex 
entities than Strand processes, which could be termed featherweight. 

Side Effects. Telephony programs spend a lot of time controlling hardware. 
We must be able to strictly sequence the required side effects. While the behavior 
of any particular hardware unit can be modeled as a sequential process, many 


13.2. Introduction To Erlang 293 


different hardware units are simultaneously performing different actions; therefore, 
we need to be able to express sequentiality within a process and concurrency 
between different processes. 

Error Recovery. Error recovery must be built into our system at a low level. 
Programming errors and unplanned hardware events should not cause the entire 
system to die. Recovery mechanisms must specifically reflect the kind of error 
recovery required in the event of an unplanned failure. For example, in a tele- 
phone call, all hardware resources allocated to a subscriber must be automatically 
released if any errors are detected during the call. Such error recovery should be 
always present and not the responsibility of the applications programmer. 

Time. Time plays an important role in all telephony applications. Erlang 
provides a special syntax for referring to time (e.g., timeouts). 

Nearness to SDL. The SDL [16] specification language is one of the most 
important languages used in the telecommunications industry to specify the be- 
havior of communicating systems. Specifications written in SDL are widely used. 
Erlang adopts a message-reception mechanism similar to that used in SDL. This 
simplifies the task of turning an SDL specification into an Erlang program. 

Message Order. Unknown delay times in a network may result in messages 
arriving in arbitrary orders. Hence, we may wish to process messages in other than 
arrival order. On other occasions, we may wish to temporarily ignore all messages 
except from a specific process. Erlang’s interprocess communication mechanism 
supports these alternative types of message reception, which must be explicitly 
programmed in Strand by filtering message streams. The code that performs this 
filtering can be quite complex, hard to understand and error-prone. However, it 
can easily be generated by a compiler. 

This type of message-reception semantics is similar to that in SDL. Erlang 
message-reception semantics are equivalent to an SDL description in which every 
point in the description at which message reception can occur has an SDL save 
construct. 


13.2.2 The Erlang Language 


We now build up a description of Erlang by example. First, we use the factorial 
and append functions presented in Program 13.1 to illustrate the use of Erlang to 
implement simple (non-communicating) processes. 

Erlang programs are functional: Evaluation of a process computes a single 
value which is returned as the result of the process. This is in contrast to Strand, 
where processes can return many values. Erlang’s evaluation mechanism, like that 
of Strand, is based on pattern-directed invocation. When a function is invoked, the 
rules in the function definition are tried sequentially in textual order. In each case, 
an attempt is first made to match the call with the head of the rule. If a head 
match succeeds, an attempt is then made to evaluate the rule’s guard. When a 
successful match and guard evaluation has been made, that rule is chosen and its 
body is evaluated. An error is signaled if no rule matches the call. 

Guards in an Erlang function can contain only calls to simple tests. They 


294 Chapter 13. Programming Telephony 


factorial(0) — return(1). 
factorial(N) | N >0— 
N1I=N-1, 
F1 = factorial(N1), 
Fac = N1 * F1, 
return(Fac). 


append([], X) — return(X). 

append([Head|Tail], X) — 
Tail1 = append(Tail, X), 
return([Head|Tail1)). 


Program 13.1: Factorial and append functions. 


are similar to guards in Strand and should be considered an extension of head 
matching. 

Statements in a selected rule body are evaluated sequentially. The return value 
of the function is the value of the last statement in the function body. The syntax 
Var = FunctionCall causes FunctionCall to be evaluated and the return value assigned 
to Var. The arguments to a function call are not evaluated. Hence, it is necessary 
to explicitly save the value of a call in a variable and pass this variable as an 
argument to a subsequent call. This mechanism is used in the factorial function in 
Program 13.1. 

We introduce Erlang’s interprocess communication primitives using some ex- 
amples taken from a simple working telephone control program (POTS). We 
present the implementation of the calling side of a phone call. 

Each subscriber is assumed to start in an idle state. That is, the receiver 
is not lifted, no call is in progress, and no one is trying to call. This will be 
represented by the Erlang function idle(Self), where Self is a reference to some 
object that represents a subscriber telephone. The function idle waits until a valid 
event occurs, i.e., the receiver is lifted or someone tries to call that subscriber. It 
then proceeds. When a call completes, each side of the call reverts to an idle state 
by again calling the function idle. 

Now consider what happens when a conventional telephone call is in progress. 
Assume that the calling subscriber has dialed a valid number and that the called 
number was vacant: No call was in progress. In such circumstances the calling 
subscriber (the A side of the call) will hear a ringing tone, to indicate that the 
phone is ringing, and the called subscriber’s phone (the B side of the call) will be 
ringing. 

The behavior of the A side of the call will be described using a function defini- 
tion ringing_a. This will be expressed using Erlang’s case function, which has the 
following syntax: 


13.2. Introduction To Erlang 295 


case CaseFunction { 
MatchTerm1 => 


MatchTerm2 => 


Other > 


a 


Evaluation of a case function involves calling the function CaseFunction and then 
attempting to match the return value with the match term in each case option. 
The options are tried sequentially in textual order. When a successful match is 
made, the body of that case option is executed. It is an error if no match is 
possible. 

A catch-all option can be defined by specifying a variable match term. This 
will of course match any return value; hence, it must be the last option given. This 
technique is used in most of the POTS functions shown here to ignore events we 
are not interested in, the usual action being just to proceed to wait for the next 
event. 


ringing-a(Self,B-Side) — 
case wait-no-seize1(Self,30) { 

event(Self,on-hook) > % R1 
stop-tone(Self), 
B_Side ! terminate, 
idle(Self); 

event(B_Side,answered) > % R2 
stop_tone(Self), 
start_switch(Self,B_Side), 
conversation_a(Self,B_Side); 

timeout > % R3 
stop_tone(Self), 
B_Side ! terminate, 
wait_clear(Self) 

Other => ringing_a(Self,B_Side); % RA 


}. 


Program 13.2: The ringing-a Function 


The ringing.a function definition is presented in Program 13.2. Note the use of 
the case function and the default case option (R4). Program 13.2 also introduces 
Erlang’s message-sending construct. The syntax for this is: 


Destination ! Message 


296 Chapter 13. Programming Telephony 


The Destination component is either a process identifier (which is created when a 
process is created) or a device identifier. A process identifier is a global reference 
to a process in the sense that it can be passed around, included in messages, and 
still be used to send messages to the process it refers to. It is a valid reference to 
a process for the lifetime of that process. The transparency of the destination of 
a message (either process or device) allows great freedom in setting up a commu- 
nicating system. For example, it is easy to insert filters or monitors at any stage 
within the system, to simulate hardware, etc. 

The function ringing_a uses a case function to call wait_no_seize1(Self,30), which 
waits for a message for a maximum of 30 seconds before timing out. This function 
returns a term of the form event(From, Event) or a timeout; it will be defined later. 

There are three things that can happen when executing ringing.a. The calling 
partner may stop the call attempt by replacing the receiver (wait_no_clear returns 
event(Self,on-hook): R1), the called partner may answer (event(B_Side,answered) is 
returned: R2) or a timeout may occur (i.e., after 30 seconds the event timeout is 
returned: R3). 

If the value event(Self,on_hook) is returned, the ring tone heard by the calling 
subscriber is stopped by calling the function stop_tone(Self). In addition, a message 
is sent to the called partner requesting that the call be abandoned and idle(Self) 
is called (R1). If the called partner answers, the ringing tone is removed and the 
switch is started (start-switch(Self,B_Side): R2). This is an audio switch that sets 
up a speech channel between the two subscribers. The call is now in progress and 
the function conversation_a is called. Note that the calling partner starts the switch 
(and will stop it later!). This is a general principle of telephony: The party that 
requests a service (in this case use of the switch) shall pay for it and determine 
when its usage is over. 

The definition for the conversation.a function is given in Program 13.3. This 
uses a function similar to wait.no-_seize1 that does not support timeouts. 


conversation_a(Self,B_Side) — 
case wait.no_seize2(Self) { 
event(Self,on_hook) => 
stop_switch(Self,B_Side), 
B_Side ! terminate, 
idle(Self); 
Other > 
conversation_a(Self,B_Side); 


}. 


Program 13.3: The conversation-a Function 


13.2. Introduction To Erlang 297 


The call proceeds until the caller ends the call by going on hook. When this 
happens the switch is stopped (stop_switch(Self,B_Side)) and a message is sent to 
the called partner to tell them that the call is over (B-Side ! terminate). The second 
option in the case function states that all other events are ignored. 

Finally, we present the wait-no_seize1 function in Program 13.4. This intro- 
duces Erlang’s receive function, which has the general syntax: 


receive TimeOut { 
Sender! ? Message1 => 


Sender2 ? Message2 => 


timeout => 


} 


As in the send function, Sender can be either a process identifier or a device 

identifier. The receive function waits for a message from a sender which matches ~ 
one of its options. The options are tried sequentially in textual order for each 

message that reaches the process. If a timeout occurs, then a timeout message 

is received. All unmatched messages are buffered for future reception. It is the 

receive function that permits a process to selectively wait for specific messages, or 

for messages from specific processes. 


wait_no_seize1 (Self, TimeOut) — 
receive TimeOut { 


From ? seize > % R1 
From ! no, 
wait_no_seize1 (Self, TimeOut); 

From ? Msg => % R2 
return(event(From,Msg)); 

timeout => % R3 


return(timeout); 


}. 


Program 13.4: The wait_no-seize1 Function. 


The wait_no_seize1 function returns all messages except seize messages as events. 
A seize message signifies that some other subscriber is trying to call us and “seize” 
our phone (R1). In this case, the function replies no (From ! no) to the request 
and wait for the next message (R1,2) or a timeout (R3). 


298 Chapter 13. Programming Telephony 


13.3 Compilation of Erlang to Strand 
A number of specific problems must be handled when compiling Erlang to Strand: 


e Functional semantics of Erlang. 

e Sequential rule matching and body execution. 

e System calls and communication with the Erlang kernel. 
e Error recovery and Erlang process termination. 


e Sending messages and the selective and buffering characteristics of receive. 


We describe how these problems have been solved and show some examples of 
Strand code generated by the compiler. 

The first compilation stage flattens the original Erlang program. This involves 
replacing all calls to case and receive with calls to extra functions that perform 
the matching operations and message buffering. The transformation required for 
case can actually be viewed as an Erlang source-to-source transformation. For 
example, Program 13.5 shows the result of transforming the function conversation_a 
(Program 13.3). Unfortunately, no valid Erlang source-to-source transformation is 
possible for the receive function, but the basic principle is the same. We will show 
examples of the compiled Strand code for both the case and receive functions. 


conversation_a(Self,B_Side) — 
Temp = wait_no_seize2(Self), 
conversation.a_case(Self,B_Side, Temp). 


conversation.a_case(Self,B_Side,event(Self,on_hook)) — 
stop_switch(Self,B_Side), 
B-Side ! terminate, 
idle(Self). 

conversation_a_case(Self,B_Side,Other) — 
conversation_a(Self,B_Side). 


Program 13.5: Transformation of conversation-a. 


The functional semantics of Erlang is supported by adding an extra argument 
to the Strand representation of an Erlang rule. This is used to hold the return 
value. In addition, we generate code to assign a value to this variable. The value 
assigned is either that named in an explicit return or the return value of the last 
call in the rule body. 

Recall that Erlang rules are tried sequentially in textual order. To enforce this 
in Strand an otherwise test is added to each rule guard. Sequential execution of the 


13.3. Compilation of Erlang to Strand 299 


calls in a function body is enforced using a short-circuit. The Strand implementa- 
tion of an Erlang process carries with it a data structure added by the compiler 
and not directly accessible from Erlang. This structure is chained through all 
functions and is used to sequence function execution. 

All system calls except those that directly affect the state of the current process 
are translated into messages to the Erlang kernel. Processes communicate with 
the kernel using streams. Two streams are used: one for sending messages to 
the kernel and one for receiving messages from the kernel. These streams are 
maintained using two extra process arguments per stream. The first argument 
represents the head of the stream; the other represents the rest of the stream after 
this function has completed. 

The design of the compiler ensures that all programming errors manifest them- 
selves as failure to match function arguments. Failure is handled in Strand by 
adding an extra rule to the Erlang function definition. This rule serves as a catch- 
all: It cleans up and terminates the Erlang process correctly in the event of failure. 
In addition, an extra argument is added as an abort flag. Other processes can as- 
sign a value to this variable to terminate all the Strand processes in an Erlang 
process. An extra rule is added before the function rules to test this variable and 
perform any necessary cleanup. This is used if an error has occurred, or we wish 
to terminate the whole Erlang process. We perform all process termination and 
cleanup explicitly to make sure that all streams and Erlang process dependent 
structures are treated correctly. 


As an example of the type of code these rules generate, we show in Pro- 
gram 13.6 the code generated for the function conversation_a originally defined 
in Program 13.3 and transformed to Program 13.5. There are a few points to 
note. The return value is returned through the variable R; To is the message 
stream to the kernel and Fr is the message stream from the kernel. PS is the data 
structure representing the process state. 


The first rule in each function uses the guard test known to test if the A variable 
has been set (R1,3). If it has, then process-terminate is called to clean up correctly. 
The last rule in the a-case process definition is an error catch-all (R6). If all 
preceding matches fail, this rule calls process_error to clean up and invoke the 
error-handling mechanism. An error rule has not been added to conversation_a as 
there is no input matching. 


The second rule in conversation.a (R2) is a compiled version of the original 
(transformed) function rule. It waits for the process data structure (data(PS)) to 
be available and then spawns the processes wait-no_seize2 and a_case. The com- 
munication streams and process structure are chained through both new processes. 
Note that the return value from the last call (a_case) is passed back as the return 
value of the function. 


The second rule in a-case is the compiled version of the original case option 
which matches a message event(Self,on-hook) (R4). This rule creates a process 
stop-_switch, sends the message terminate to B and finally spawns an idle process. 
Messages are sent using the process send, which adds a suitable data structure to 


300 Chapter 13. Programming Telephony 





conversation_a(A,R, To, To1,Fr,Fr1,PS,PS1,_,-) :— % R1 
known(A) | 
process_terminate(A,R, To, To1,Fr,Fr1, PS, PS1). 
conversation_a(A,R, To, To2,Fr,Fr2,PS,PS2,Self,B) :— % R2 


otherwise, data(PS) | 
wait-no-seize2(A,T, To, T01,Fr,Fr1,PS,PS1,Self), 
a_.case(A,R,To01,T02,Fr1,Fr2,PS1,PS2,Self,B, T). 


a_case(A,R, To, 101,Fr,Fr1,PS,PS1,-_,.,-) :— % R3 
known(A) | 
process-terminate(A,R, To, To1,Fr,Fri,PS,PS1). 
a-case(A,R,To, 103,Fr,Fr2,PS,PS3,Self,B,event(Self,on_-hook)) :— % RA 


otherwise, data(PS) | 
stop_switch(A,R1,To, To1,Fr,Fr1,PS,PS1,Self,B), 
send(A, 1To1,To2,PS1,PS2,B,terminate), 
idle(A,R, To2, T3,Fr1,Fr2,PS2,PS3,Self). 
a_case(A,R, To, To1,Fr,Fr1,PS,PS1,Self,B,Other) :— % R5 
otherwise, data(PS) | 
conversation_a(A,R, To, To1,Fr,Fr1,PS,PS1,Self,B). 
a_case(A,R, To, T01,Fr,Fr1,PS,PS1,-,-,-) :— % R6 
otherwise | 
process-error(A,R, To, 101,Fr,Fr1,PS,PS1). 


Program 13.6: A Complete Transformation. 


the output stream. The third rule corresponds to the case option which matches 
ignored events (R5). It executes conversation-a to wait for a new message. 

The main remaining problem is receive. Messages arrive on the stream from 
the kernel. When compiling receive, special care must be taken to buffer messages 
that have been received but not yet asked for. This is done by adding an extra 
pair of arguments to form a difference list containing these messages. An auxiliary 
procedure is generated during the “flattening” stage to recurse down the list of 
messages until one that matches is found. 

The compilation of a function definition containing a receive is illustrated in 
Program 13.7. This is the Strand code generated for the function wait-no-seize2 
(Program 13.4, but without the timeout option). As there is no input matching in 
the original Erlang function, no error rule is required. Furthermore, as the body 
only contains a call to the auxiliary procedure, testing of the abort variable can be 
postponed until the auxiliary procedure, wns-rec (R2). The auxiliary procedure is 
called with an empty difference list to buffer unmatched messages. This is defined 
by the B arguments (R1). 

If the abort variable is set, any unmatched messages in the buffer are prepended 
to the stream from the kernel and process-terminate is called as before (R2). A 


13.4. System Architecture 301 


wait_no_seize2(A,R, To, To1,Fr,Fr1,PS,PS1,Self) :— % R1 
wns.rec(A,R, To, To1,Fr,Fr1,PS,PS1,B,B,Self). 


wns.rec(A,R, To, To1,Fr,Fr1,PS,PS1,Bh,Bt,_) :— % R2 
known(A) | Bt := Fr, 
process-_terminate(A,R, To, To1,Bh,Fri,PS,P$1). 
wns.rec(A,R, To, To2,[msg(F,seize)|Fr],Fr1,PS,PS2,Bh,Bt,Self) :— % R3 
otherwise, data(PS) | 
Bt := Fr, send(A,PS,PS1, To, To1,F,no), 
wait_no_seize2(A,R, To1, To2,Bh,Fri,PS1,PS2,Self). 
wns.rec(A,R, To, To1,[msg(F,Msg)|Fr],Fri,PS,PS1,Bh,Bt,_) :— % RA 
otherwise, data(PS) | 
Bt := Fr, To1 := To, Fri := Bh, PS1 := PS, R := event(F,Msg). 
wns_rec(A,R, To, To1,[Message|Fr],Fr1,PS,PS1,Bh,Bt0,Self) :— % R5 
otherwise, data(PS) | BtO := [Message|Bt], 
wns_rec(A,R, To, To1,Fr,Fri,PS,PS1,Bh,Bt,Self). 


Program 13.7: Transformation of wait-no-seize2. 


seize message has the form msg(F,seize). Normal matching is used to wait for 
this message on the stream from the kernel (R3). If a seize message is received, 
then unmatched messages in the buffer are prepended to the input stream, a send 
process is created to send a reply no and finally we again execute wait_no-seize2 
to wait for another message. Any message that is not seize is taken off the in- 
put stream (R4). The streams and data state are passed back to the caller and 
the return value is assigned. The recursive rule handles the case where a mes- 
sage on the input stream cannot be matched (R5). The message is appended to 
the unmatched message buffer and the process recurses to try and match a new 
message. 

Note how unmatched messages are added to the local buffer (R5) and the 
buffer prepended to the input stream (R2,4). These actions ensure that the order 
of unreceived messages is not changed. A timeout can be added by starting an 
extra process that binds a variable when a specified time has elapsed. A check on 
this variable is added to the process. 


13.4 System Architecture 


In order to program telephony applications concisely one has to build a layered 
architecture. The layers successively abstract away from hardware details and 
towards a set of abstractions that are appropriate for the application. The principle 
layers in our system are illustrated in Figure 13.2. 

The highest level of the system (Customer) is where telephony features can 


302 Chapter 13. Programming Telephony 


Telephony Features (Customer) 

Application Operating System (AOS) Erlang 
Basic Operating System (BOS) 

Run Time Executive (kernel) STRAND 


Figure 13.2: System Architecture. 


be programmed. At this level the user need know nothing about error recovery, 
resource allocation, distribution, etc. These features are incorporated at the lower 
levels. The user is presented with abstractions that directly relate to telephony 
transactions. The programmer can directly manipulate entire calls between dif- 
ferent subscribers. Calls can be split and merged and joined into arbitrary and 
complex patterns without bothering about the details of how this is actually per- 
formed at a more detailed level. 

The Application Operating System (AOS) provides abstractions used by the 
customer level. The AOS consists of approximately 120 routines. (These were 
written by our user group at Ericsson Business Communications in Bollmora, 
Sweden). These encapsulate explicit knowledge of telephony: For example, they 
“know” what is meant by call forwarding, abbreviated dialing, call collect debiting 
etc. However, they have little knowledge about lower level mechanisms, such as 
error recovery and distribution. At this level in the system, it is inappropriate for 
the programmer to know the details of how these mechanisms are implemented. 

The next layer in the system (BOS) is responsible for error recovery (as seen 
by the application programmer), device allocation, name-space management, etc. 
It also supports operations on abstract objects such as switches and tone-sending 
devices. These are abstractions of the physical switches and tone-sending devices. 

The bottom level of the Erlang run-time system (Kernel) is a small Strand 
executive. This is responsible for process spawning, interprocess message passing, 
error recovery and global name registration. The efficiency of the executive system 
can be compared with the efficiency of the underlying Strand system. Spawning an 
Erlang process takes eight reductions; sending a message from one Erlang process 
to another takes six reductions. (Our message passing semantics differ from those 
in Strand, and offer improved functionality for the telephony programmer). This 
overhead is entirely acceptable for our applications. The advantages of completely 
abstracting away all internal details of message passing from the applications pro- 
grammer (i.e., all explicit handling of Strand streams) seem to far outweigh any 
small differences in efficiency. Where overhead is excessive, the appropriate Er- 
lang routines can be hand-coded in Strand. The integration of hand-coded routines 
with the kernel is a simple since Erlang itself is directly compiled into Strand. 


13.5. Performance 303 
13.5 Performance 


It is too early to say if the approach outlined here can compete with conventional 
techniques in terms of performance when executing pure switching applications. 
Current telephone exchanges are often controlled by special-purpose processors 
and languages that are designed for fast context switching and process creation. 
Our approach is to take a general-purpose language and implement the features 
we require on top of this language. 

It is difficult to directly compare this work with conventional implementations. 
We implemented a different set of features than are available in a conventional sys- 
tem; also, the criteria for evaluation are continuously changing. We are, however, 
satisfied with the results of the experiment and can report significantly reduced 
code volumes and lead times for the applications we have programmed. Whether 
this result will remain true for a larger system with increased functionality is a 
subject of future experiments. 

Further problems must be tackled when this small-scale experiment is ported 
into an embedded real-time environment. We will need to measure the real-time 
performance of the system, paying special attention to problems of worst-case 
garbage collection time and memory utilization. 


13.6 The Experimental System 


The current experimental system consists of two interconnected Ericsson MD110 
PABXs. Each PABX (or node) of our system is controlled by a Sun 3/60. To 
date we have concentrated on building a set of library routines (the AOS) for the 
applications operating system and the necessary BOS support for these routines. 
Using these routines, we have built an experimental facility capable of performing 
the following telephony features: 


e Normal POTS (Plain Ordinary Telephony Services, i.e., A calls B) 
e Full three-party services: call transfer, enquiry calls. 

e Call intrusion (intrude into an existing call) 

e Operator services (allow an operator to re-arrange existing calls) 

e Call diversion: call forwarding, call forwarding on busy. 

e Conferencing (multi-party conferencing) 

e Abbreviated dialing 

e Alarm calls 


e Call back on busy 


304 Chapter 13. Programming Telephony 


The Erlang implementation of these features consists of approximately 5000 lines 
of code. While such a set of features is only a small fraction of the total number 
of features that exist in a modern private exchange, they are representative of 
the kinds of features that are required. Initial feedback from our users (i.e., the 
telephony programmers who are using Erlang) has been positive. The intention of 
the code is clear and it is our feeling that the code volumes are themselves signif- 
icantly lower than they would have been for a similar set of features programmed 
in a conventional imperative language. 


13.7 Discussion 


It is our contention that Strand offers an acceptable base for industrial real-time 
applications. However, we feel that direct programming in Strand is unnecessary 
for most purposes and that specific application languages should be built on top 
of Strand. In this sense Strand can be viewed as a powerful assembly language for 
programming concurrent applications. We have illustrated this with reference to 
a telephony applications programming language. 

The fact that a single Erlang function of arity N (usually) compiles to a Strand 
process definition of arity N+8 with two extra rules is not insignificant. The 
additional parameters and rules that we have added are all needed if one wishes to 
add a measure of sophisticated error handling to a set of communicating Erlang 
processes and to reduce the amount of detail that a programmer must specify in 
a program. As these additions are very regular, it is easy for an Erlang-to-Strand 
compiler to add these extra constructs. However, this extra detail is a possible 
source of programming error if added manually. The compilation examples show 
a significant reduction in the number of parameters that must be specified and, 
we hope, an increase in the clarity of the code. 

A measure of the power of the Strand language is the relative ease with which 
another language system with different operational model and communication 
mechanisms could be built on top. It proved easy to restrict and control Strand’s 
concurrency and to modify its communication mechanisms. It would have been 
considerably more difficult to build Erlang on top of an existing sequential lan- 
guage. 


Acknowledgments 


The authors would like to thank the members of the Computer Science Labora- 
tory and in particular the members of the Dunder project group for their help and 
support in this project. The work reported in this chapter originated in the Com- 
puter Science Laboratory at Ericsson Telecom. This laboratory has subsequently 
moved to Ellemtel Utvecklings AB. 


Bibliographic Notes 


This book draws on a variety of material from the published literature. Here we 
provide references for further reading. 

Chapter 1. The program design methodologies introduced in Chapter 1 
and used throughout the book were eloquently described by Wirth [77] and Par- 
nas [60,61]. These influential papers provide excellent material concerning the 
methodologies. Chandy and Misra [18] provide a formal treatment of concurrent 
systems that emphasizes the use of stepwise refinement. 

Chapter 2. Concurrent logic programming languages originated with the 
Relational Language of Clark and Gregory [19]. Subsequent investigations in 
England [20,34,30,63], Israel [53,70], Japan [75] and elsewhere, studied a vari- 
ety of alternative language proposals in a quest to develop practical systems. The 
Strand language is based on the authors’ thesis research [29,74]. Its origins and 
relationship to other proposals are described in [31]. The syntax of Strand derives 
from that of Edinburgh Prolog [76,22]. The presentation of the operational model 
is based on that given by Gerth et al. [32]; Codish, Saraswat and Winsborough 
contributed directly to this definition. Saraswat [67] provides a thorough treat- 
ment of semantic issues in concurrent logic programming languages. The guard 
tests and predefined processes derive from Prolog, Parlog [33] and Logix [68]. 

Chapter 3. The six parallel programming techniques described in Chapter 
3 have been developed via variety of applications. The stream communication 
techniques are well-known in operating systems and other contexts [42]. Difference 
lists were described by Clark and Tarnlund [21]. The short-circuit technique is due 
to Takeuchi [73]. Blackboards are a commonly used structure that the authors 
have found particularly useful. Mergers have been widely described [7,14,59] and 
were first specified for concurrent logic programming by Safra and Shapiro [71]. 
This latter specification defines the Strand merger. 

Chapter 4. For general background information on data structures and algo- 
rithms, see Knuth [44]; Aho, Hopcroft and Ullman [2]; and Akl [3]. The minimum- 
cost spanning-tree algorithm used in this chapter was proposed by Kruskal [47]. 
A presentation of this algorithm, and of the data structures needed to represent 
its data, can be found in Aho, Hopcroft and Ullman. This algorithm is essentially 
sequential in nature; several parallel algorithms for the minimum-cost spanning 
tree problem are described and referenced in Akl. The process-oriented approach 
to programming explored in Chapter 4 has many similarities with object-oriented 


305 


306 Bibliographic Notes 


programming [40,46] and the actor model of computation [37,1,8]. These simi- 
larities have been explored by Shapiro and Takeuchi [72], Kahn et al. [43] and 
Davison [25]. 

Chapter 5. Andrews and Schneider [4] provide a good survey of concur- 
rent programming. Any good book on operating systems will provide a thorough 
treatment of monitors, mutual exclusion, condition synchronization, deadlock and 
starvation; for example Brinch Hansen [13] and Peterson and Silberschatz [62]. 
Hoare [39] is another useful reference in this area. Peterson and Silberschatz pro- 
vide concrete examples of resource management in distributed systems. Date [24] 
gives a good introduction to database systems. Bernstein and Goodman [9] pro- 
vide a survey of concurrency control issues in distributed databases. The material 
on concurrency control in Section 5.2 is taken from [29]. The Dining Philoso- 
phers problem used as a model for the Philosophical Programmers problem was 
first proposed by Dijkstra [26]. A distributed solution is described in Chandy 
and Misra [17]. Ringwood [63] has also applied a concurrent logic programming 
language to specify a solution to this problem. Search techniques are common to 
many programming problems and are often treated in artificial intelligence texts, 
for example Nilsson [57]. 

Chapter 6. The ideas and descriptions presented in this chapter are based on 
a landmark paper by Parnas [61]. The techniques used to implement intermodule 
calls are described in [29]; similar techniques are employed in the Logix system [68]. 
The code mapping ideas are derived from [74]. 

Chapter 7. Though the material in this chapter focuses on Strand system- 
dependent features, it is based on general computer science techniques. 

Chapter 8. The material in this chapter is derived from [74]. Early work 
in this area [51] presented ideas for utilizing an infinite surface of computers. 
Subsequent work by Shapiro [69] investigated the use of turtle annotations similar 
to those used in Logo [58]. The grid example developed in this chapter is based 
on a paper by Butler et al. [15]. 

Chapter 9. Metaprogramming is a fundamental notion in many applicative 
programming formalisms. Lisp systems, for example, rely heavily on the equiv- 
alence of program and data [65]. Metaprogramming techniques are frequently 
applied in logic programming. Kowalski [45] discusses the use of interpreters to 
simulate alternative evaluation strategies; these ideas are pursued in more detail 
in [11]. Interpreters have also been used to build expert systems [35]. The in- 
tegration of metaprogramming concepts into an environment for concurrent logic 
programming is described in [29]. The simple Strand interpreter presented as Pro- 
gram 9.1 is based on the FCP interpreter presented in [64]. The use of interpreters 
and transformation techniques for managing computation is discussed by Hirsch 
et al. [38]; alternative and more efficient strategies are presented in [29]. The 
techniques of Hirsch et al. have been included in the Logix system [68]. 

Appendix A. The predefined processes and guard tests used in Strand are 
based on those used in Prolog but have appeared in a variety of guises in all 
three of the main concurrent logic programming efforts [34,70,75]. The appendix 
exposition is based on that of Logix [68]. 


Appendix A 


Predefined Tests, Processes 
and Operators 


Strand provides a number of predefined guard tests, body processes and operators. 
These are described briefly here. The reader is referred to the Strand User Manual 
for more information. 


A.1 Guard Tests 


Recall that guard tests perform tests on process arguments and may succeed, 
suspend or fail; they suspend if they encounter a variable during execution. Guard 
tests are principally concerned with type checking and term comparison. Let us 
begin with some preliminary definitions: 


Identical. Two terms are identical if they are the same constant 
(i.e., string or number) or if they are structures of the same arity and 
corresponding subterms are identical. A constant and a structure are 
not identical; nor are different constants or structures of different arity. 


Standard Ordering. The standard ordering of terms is defined to 
be: 


numbers < strings < lists < tuples 


Numbers are ordered according to their numeric value. Integers are 
less than their real equivalents. Strings are ordered according to their 
ASCII value. Lists are ordered on their head then their tail. Tuples 
are ordered by their arity, and then by their arguments, in left-to-right 
order. The empty list is treated as a tuple of arity 0. 


30? 


308 Appendix A. Predefined Tests, Processes and Operators 


Type Checking. 


string(T) T is a string. list(T) T is a list. 
integer(T) T is an integer. module(T) T is a module. 
real(T) T is areal number. _ user(T) T is user-defined data. 


tuple(T) T is a tuple. 


Matching. These tests execute left-to-right and depth-first in term structure; 
they suspend if they encounter a variable. 


X == Y X and Y are identical. 
X=\=Y X and Y are not identical. 


Arithmetic Comparison. (Integers and reals) 


X>Y, X<Y, X=<Y, X>=Y 


Term Comparison. (All terms) 


Comparison of two terms X and Y according to the standard order: 


X Q< Y, X >Q Y, X @=<Y, X >@= Y 


Control. The following processes provide the programmer with additional con- 
trol over program execution. These processes are primarily intended for systems 
programming and are rarely used in application programs. 


otherwise succeeds if all textually previous rules in the process defi- 
nition fail. 


unknown(X) succeeds if the value of X is not currently available and 
fails otherwise. If applied repeatedly to the same value, it is guaranteed 
to fail eventually if X is assigned a value. 


known(X) succeeds if the value of X is currently available and fails 
otherwise. If applied repeatedly to the same value, it is guaranteed to 
succeed eventually if X is assigned a value. 


data(X) suspends until the value of X is available. 


Note that variables cannot be compared. If variables are encountered when com- 
paring terms using ==, = \=, >Q, etc., the test suspends. 


A.2. Predefined Processes 309 


A.2 Predefined Processes 


Strand provides a variety of predefined processes for performing arithmetic, gener- 
ating terms, manipulating terms, changing the form of a term (e.g., from a string 
to a list) etc. The principal difference between these processes and guard tests is 
that they are written in a rule body and may generate values. The arguments to 
predefined processes are always either input or output. In general, input implies 
that the process suspends until an appropriate value is available; output implies 
that the process generates a value. Predefined processes typically raise an ex- 
ception (similar to arithmetic underflow or divide by zero) under the following 
situations: 


e An incorrect input is supplied. 


e An error occurs during execution (e.g., an attempt to extract the tenth 
argument of a tuple of arity 5). 


e An output argument is already bound. 


To illustrate the use of these processes, consider the following (not particularly 
meaningful) rule: 


new-tuple(T,S,1T1) :— 
tuple(T), string(S) | 
length(T,N), 
N1 is N x* 2, 
make_tuple(N1,1T1), 
length(S,L), 
put_arg(1,11,L). 


This rule takes two arguments as input, a tuple T and a string S; guard tests are 
used to verify the types of these arguments. The arity of the tuple T is determined 
and returned in the variable N. This number is doubled using the arithmetic process 
is; the new number is placed in N1. A new tuple T1 that is double the size of T 
(i.e., size N1) is then created. An integer representing the length of the string S 
is placed in the first argument of the new tuple. The rule illustrates the use of 
body processes to generate and manipulate terms. Notice that if any argument is 
unknown then the processes will suspend until the appropriate data is available. 
Since these processes generate values, it is always possible to determine when a 
process has terminated by testing for the generated output. 

Strand’s predefined processes have been chosen to provide a minimal but useful 
set of operations. Other less-important functions can be implemented as user- 
defined processes (see Chapter 7) if required. In the following, list body-process 
arguments are annotated to indicate whether they are consumed by the process 
(input arguments: ?) or are generated by the process (output arguments: 1). 
Testing of output argument values can be used to ascertain that a process has 
terminated. 


310 Appendix A. Predefined Tests, Processes and Operators 


Term Manipulation 


put_arg(N?,T?,A?,R1) The Nth argument of tuple T is A. R is an 
optional argument that is bound to the empty list when the operation 
is complete. 


get_arg(N?,T?,AT,RT) A is the Nth argument of tuple T. R is an 
optional argument that is bound to the empty list when the operation 
is complete. 


make_tuple(A?,TT) T is a new tuple of arity A that contains unique 
variables. 


string_to_list(S?,H1,T?) H/T is a difference list of integers repre- 
senting the characters in the string S. T need not be available for this 
process to execute. 


list_to_string(L?,ST) S is a string corresponding to the list of ASCII 
values L. 


list_to_module(L?,M1) M is the module corresponding to the list of 
bytes L. (L must be ground at the time of call). 


tuple_to_list(Tuple?,H1,T?) H/T is a difference list containing the 
elements of Tuple. T need not be available for this process to execute. 


list_to_tuple(L?,TT) T is a tuple containing the elements in the list 
L. 


list_to_integer(L?,I1) J is the integer represented by the list of ASCII 
values L. 


integer_to_list(I?,L1) L is a list of ASCII values representing the 
integer I. 


list_to_real(L?,R1) R is the real represented by the list of ASCII 
values L. 


real_to_list(R?,LT) L is a list of ASCII values representing the real 
R. 


length(T?,LT) L is the length of a list, string or tuple T. 


assign(X1,Y?,RT) X is assigned the value Y; R is assigned the empty 
list when the operation has completed. Y need not be available for this 
process to execute. 


Programming in the Large 


run(M?,P?) initiates execution of process P using module M. 


A.3. Predefined Operators 311 


Merger 


merger(I’,OT) consumes a stream of messages on the input stream 
I and outputs them on the output stream O. A message of the form 
merge(S) is not output; instead, S is viewed as an additional input 
stream and messages arriving on S are also output on O. A merger 
guarantees that any message placed on any of its input streams will 
eventually appear on its output stream and that the order of messages 
in each input stream is preserved in the output. The merger closes the 
output stream and terminates when all input streams are closed. 


Arithmetic 


XT is E? X is the result of evaluating the arithmetic expression E. The 
result is an integer if only integers are involved and a real otherwise. 
No range checking is performed on the results of integer operations. 
The following operators are supported: 


e +, —, *, / (addition, subtraction, multiplication and division). 
Can be applied to both integers and reals. 


e // (modulo). Can only be applied to integers. 
e — (unary minus). Can be applied to both integers and reals. 


e /\, \/, \ (bitwise conjunction, disjunction and complement). Can 
only be applied to integers. 


e real(X) evaluates to the real value of a real or integer X. 


e integer(X ) evaluates to the integer value of a real or integer X, 
rounding toward zero. 


e abs(X ) evaluates to the absolute value of real or integer X. 


A.3 Predefined Operators 


Strand provides a number of predefined operators so as to make programs easier to 
read. It is also possible to define and use new operators; this is extremely useful, as 
it means that the Strand parser can be employed for other purposes than parsing 
Strand programs. There are three attributes of an operator that are important: 
position, precedence and associativity. 

An operator’s position determines where that operator is written. It can be 
either prefix, infix or postfix meaning first, in the middle or last, respectively. For 
example, the operator ‘—’ can be used as an infix operator to mean subtraction 
or as a prefix operator to mean unary minus. 

The second attribute of an operator is its precedence which specifies its impor- 
tance. For example, it is precedence that informs us that ‘*’ is more important 


312 Appendix A. Predefined Tests, Processes and Operators 


than ‘+’, and which determines the order of evaluation when an arithmetic expres- 
sion is executed using is. The order of evaluation can always be explicitly stated 
using brackets, ( ), for disambiguation. 

Finally, the third attribute of an operator is its associativity, which states how 
an operator associates with itself. For example, the expression 9/3/3 could be 
interpreted as either (9/3)/3 = 1 or 9/(3/3) = 9. There are only two choices - 
here: An operator associates either to the left or to the right. A left-associative 
operator must have the same or lower precedence operators to the left and lower 
precedence operators on the right. In Strand, ‘/’ is defined to be left-associative, 
so the first order of evaluation is chosen. 

The operators are described in Tables Al, A2 and A3. In each case, the 
operator name, precedence and associativity is listed. 


Table A1: Infix Operators 


[Name | Precedence | Associativity || Name | Precedence | Associativity_ 
c= o | o [== [| 700 | lee 


eft 





Table A2: Prefix Operators 


Associativity 





Table A3: Postfix Operators 


Name Associativity 





Bibliography 


[1] 
[2] 
[3] 
|4] 


[5] 


[6] 


[7] 


[8 


bee 


[9] 


[10] 


[11] 


Agha, G., Actors: A Model of Concurrent Computation in Distributed Sys- 
tems, MIT Press, Cambridge, MA, 1986. 


Aho, A.V., Hopcroft, J.E. and Ullman, J.D., Data Structures and Algorithms, 
Addison-Wesley, Reading, MA, 1983. 


Akl, S.K., The Design and Analysis of Parallel Algorithms, Prentice-Hall, 
Englewood Cliffs, NJ, 1989. 


Andrews, G.R. and Schneider, F.B., Concepts and notations for concurrent 
programming, Computing Surveys, 15(1): 3-43, 1982. 


Armstrong, J.L., Elshiewy, N.A. and Virding, S.R., The phoning philosophers 
problem, or logic programming for telecommunications applications, Proc. of 
the 3rd IEEE Symposium on Logic Programming, 1986. 


Armstrong, J.L. and Williams, M., Using Prolog for rapid prototyping of 
telecommunications systems, Proc. Software Engineering for Telecommunica- 
tion Switching Systems, 1989. 


Arvind, Gostelow, K.P. and Plouffe, W., Indeterminancy, monitors and data- 
flow, Proc. 6th ACM Symp. on Operating System Principles, OS Review, 
11(5): 159-169, 1977. 


Athas, W.C. and Seitz, C.L., Cantor user report, Version 2.0, Tech. report 
5232:TR:86, Dept. of Computer Science, California Institute of Technology, 
1986. 


Bernstein, P.A. and Goodman, N., Concurrency control in distributed 
databases, Computing Surveys, 13(2): 185-221, 1982. 


Billstrom, D., Brandenburg, J. and Teeter, J., CCLisp on the iPSC Concur- 
rent Computer, Proc. of the Sizth National Conference on Artificial Intelli- 
gence, Seattle, WA, July 1987. 


Bowen, K.A. and Kowalski, R.A., Amalgamating language and metalanguage 
in logic programming, Logic Programming, Academic Press, NY, 153-172, 
1982. 


313 


314 BIBLIOGRAPHY 


[12] Boyle, J. et al., Portable Programs for Parallel Processors, Holt, Rinehart and 
Winston, Inc., NY, 1987. 


[13] Brinch Hanson, P., Operating Systems Principles, Prentice-Hall, Englewood 
Cliffs, NJ, 1973. 


[14] Brock, J.D. and Ackerman, W.B., Scenarios: A model of non-deterministic 
computations, Formalization of Programming Concepts, Dias and Ramos 
(Eds), Springer-Verlag LNCS 107, 252-259, 1981. 


[15] Butler, R., Lusk, E., McCune, W. and Overbeek, R., Parallel logic program- 
ming for numeric applications, Proc. 3rd Intl. Conf. on Logic Programming, 
Springer-Verlag LNCS 225, 375-388, 1986. 


[16] CCITT. Functional specification and description language (SDL), CCITT 
Recommendation Z.100, International Telecommunication Union, Geneva, 
1980. 


[17] Chandy, K.M. and Misra, J., The drinking philosophers problem, ACM Trans- 
actions on Programming Languages and Systems, 6(4): 632-646, 1984. 


[18] Chandy, K.M. and Misra, J., Parallel Program Design, Addison-Wesley, Read- 
ing, MA, 1988. 


[19] Clark, K.L. and Gregory, S., A relational language for parallel programming, 
Proc. 1981 ACM Conf. on Functional Programming Languages and Computer 
Architectures, 171-178. 


[20] Clark, K.L. and Gregory, S., Parlog: Parallel programming in logic, ACM 
Trans. Programming Languages and Systems, 8(1): 1-49, 1986. 


[21] Clark, K.L. and Tarnlund, S.A., A first order theory of data and programs, 
Information Processing 77; Proc. IFIP Congress 77, B. Gilchrist (Ed.), Else- 
vier/North Holland, 939-944. 


[22] Clocksin, W.F. and Mellish, C.S., Programming in Prolog, Springer-Verlag, 
1981. 


[23] Dacker, B., Elshiewy, N., Hedeland, P., Welin, C-W. and Williams, M., Exper- 
iments with programming languages and techniques for telecommunications 
applications, Proc. 6th Intl Conf. on Software Engineering for Telecommuni- 
cation Switching Systems, 1986. 


[24] Date, C., An Introduction to Database Systems, Addison-Wesley, Reading, 
MA, 1986. 


[25] Davison, A. Blackboard systems in Polka, Intl Journal of Parallel Program- 
ming, 1989. 


BIBLIOGRAPHY 315 


[26] Dijkstra, E.W., Hierarchical ordering of sequential processes, Acta Informat- 
ica 1, 115-138, 1971. 


[27] Downey, P.H., Sethi, R. and Tarjan, R.E., Variations on the common suber- 
pression problem, JACM, 27(4): 736-757, 1980. 


[28] Ericsson. The MD110. Ericsson Review, 1 and 2, 1982. 


[29] Foster, I.T., Systems Programming in Parallel Logic Languages, Prentice-Hall, 
London, 1989. 


[30] Foster, I.T. and Taylor, S., Flat Parlog: a basis for comparison, Intl Journal 
of Parallel Programming, 16(2), 1988. 


[31] Foster, I.T. and Taylor, S., Strand: a practical parallel programming lan- 
guage, Technical report, Math. and Computer Science Division, Argonne Na- 
tional Laboratory, Argonne, Illinois, 1989. 


[32] Gerth, R., Codish, M., Lichtenstein, Y. and Shapiro, E., Fully abstract de- 
notational semantics for Flat Concurrent Prolog, Proc. Symp. on Logic in 
Computer Science, 1988. 


[33] Gilbert, D. (Ed.), Sequential Parlog machine reference manual. Technical 
Report, Dept. of Computing, Imperial College, London, 1987. 


[34] Gregory, S., Parallel Logic Programming in PARLOG, Addison-Wesley, Read- 
ing, MA, 1987. 


[35] Hammond, P., Micro-Prolog for Expert Systems, Micro-Prolog: Programming 
in Logic, Clark, K.L. and McCabe, F.G. (Eds), Prentice-Hall, 1984. 


[36] von Heijne, G., Sequence Analysis in Molecular Biology, Academic Press, New 
York, 1987. 


[37] Hewitt, C.E., Viewing control structures as patterns of passing messages, 
Journal of Artificial Intelligence, 8(3): 323-364, 1978. 


[38] Hirsch, M., Silverman, W. and Shapiro, E., Layers of protection and con- 
trol in the Logix system, Concurrent Prolog: Collected Papers, MIT Press, 
Cambridge, MA, 1987. 


[39] Hoare, C.A.R., Communicating sequential processes, CACM, 21(8): 666-677, 
1978. Also in book with same name, Prentice-Hall, Englewood Cliffs, NJ, 
1985. 


[40] Ingalls, D.H., The Smalltalk-76 programming system and implementation, 
Conf. Record 5th Annual ACM Symposium on Principles of Programming 
Languages, 9-16, 1978. 


[41] Jefferson, D. et al., Distributed simulation and the Time Warp operating 
system, ACM Operating Systems Review, 1987. 


316 BIBLIOGRAPHY 


[42] Kahn, G. and MacQueen, D., Coroutines and networks of parallel processes, 
Information Processing 77: Proc. IFIP Congress, B. Gilchrist (Ed.), 993-998, 
North-Holland, 1977. 


[43] Kahn, K., Tribble, E., Miller, M. and Bobrow, D., Objects in concurrent logic 
programming languages, Proc. ACM Conf. on Object-Oriented Programming 
Systems, SIGPLAN Notices 21(11): 242-257, 1986. 


[44] Knuth, D.E., The Art of Computer Programming, Vol. 1: Fundamental Al- 
gorithms, Addison-Wesley, Reading, MA, 1973. 


[45] Kowalski, R.A., Logic for Problem Solving, North-Holland, Amsterdam, 1979. 


[46] Krasner, G., Smalltalk-80, Bits of History, Words of Advice, Addison-Wesley, 
Reading, MA, 1983. 


[47] Kruskal, J., On the shortest spanning subtree of a graph and the traveling 
salesman problem, Proc. Amer. Math. Soc., 7, 48-50, 1956. 


[48] Landau, G.M. and Vishkin, U., An efficient string matching algorithm with K 
substitutions for nucleotide and amino acid sequences, Journal of Theoretical 
Biology, 126, 483-490, 1987. 


[49] Luckham, D.C. et al., Stanford Pascal verifier user manual, Technical report 
STAN-CS-79-731, Stanford University Computer Science Department, 1979. 


[50] Marcus, L., SDVS7 Users’ Manual, Technical report ATR-88(3778)-5, The 
Aerospace Corporation, 1988. 


[51] Martin, A.J., The Torus: An exercise in constructing a processing surface, 
Proc. Conf. on Very Large Scale Integration: Architecture, Design, Fabrica- 
tion, California Institute of Technology, 52-57, 1979. 


[52] Martin, D.F, and Cook, J.V. Adding Ada program verification capability to 
the State Delta Verification System (SDVS), Proc. 11th National Computer 
Security Conference, Baltimore, MD, 1988. 


[53] Mierowsky, C., Taylor, S., Shapiro, E., Levy, J. and Safra, M., Design and 
implementation of Flat Concurrent Prolog, Technical Report CS85-09, Dept. 
of Computer Science, Weizmann Institute of Science, Rehovot, Israel, 1985. 


[54] Misra, J., Distributed Discrete-Event Simulation, Computing Surveys 18(1): 
39-65, March 1986. 


[55] Neelamkavil, F., Computer Simulation and Modelling, John Wiley, 1987. 


[56] Nelson, G. and Oppen, D., Fast decision procedures based on congruence 
closure, JACM, 27(2): 356-364, 1980. 


[57] Nilsson, N., Principles of Artificial Intelligence, Springer-Verlag, 1982. 


BIBLIOGRAPHY 31? 


[58] Papert, S., Mindstorms: Children, Computers, and Powerful Ideas, Basic 
Books, NY, 1980. 


[59] Park, D., On the semantics of fair parallelism, Springer-Verlag LNCS 86, 
Bjorner, D. (Ed.), 504-526, 1980. 


[60] Parnas, D., A technique for software module specification with examples, 
CACM 15: 330-336, 1972. 


[61] Parnas, D., On the criteria to be used in decomposing systems into modules, 
CACM 15(12): 1053-1058, 1972. 


[62] Peterson, J. and Silberschatz, A., Operating Systems Concepts, Addison- 
Wesley, 1983. 


[63] Ringwood, G., PARLOG86 and the dining logicians, CACM, 31(1): 10-25, 
1988. 


[64] Safra, S. and Shapiro, E., Meta-interpreters for real, Proc. IFIP’86, 1986. 


[65] Sandewall, E., Programming in an interactive environment: the Lisp experi- 
ence, Computing Surveys, 35-72, 1978. 


[66] Sankoff, D. and Kruskal, J.B. (Eds), Time Warps, String Edits, and Macro- 
molecules: The Theory and Practice of Sequence Comparison, Addison- 
Wesley, Reading, MA, 1983. 


[67] Saraswat, V., Concurrent constraint programming languages, Ph.D. thesis, 
Computer Science Dept., Carnegie-Mellon University, 1989. 


[68] Silverman, W. et al., The Logix Reference Manual, Version 1.22, Weizmann 
Institute of Science, Tech. Rept. Technical Report CS8&6-21, 1986. 


[69] Shapiro, E., Systolic programming: a paradigm for parallel processing, Proc. 
Intl Conf. on Fifth Generation Computer Systems, 458-471, 1984. 


[70] Shapiro, E., Concurrent Prolog: a progress report, IEEE Computer, 1986. 


[71] Shapiro, E. and Safra, M., Multiway merge with constant delay in Concurrent 
Prolog, New Generation Computing, 4(2): 211-216. 


[72] Shapiro, E. and Takeuchi,A., Object-oriented programming in Concurrent 
Prolog, New Generation Computing, 1(1): 25-49, 1986. 


[73] Takeuchi, A., How to solve it in Concurrent Prolog, unpublished note, ICOT, 
1983. 


[74] Taylor, S., Parallel Logic Programming Techniques, Prentice-Hall, Englewood 
Cliffs, NJ, 1989. 


[75] Ueda, K., Guarded Horn Clauses, Eng.D. thesis, University of Tokyo, 1986. 


318 BIBLIOGRAPHY 


[76] Warren, D.H.D., Applied logic — its use and implementation as a program- 
ming tool, SRI International Tech. Rep. 290, 1983. 


[77] Wirth, N., Program development by stepwise refinement, CACM 14: 221-227, 
1971. 


Index 


Aborting execution, 141, 299 
Abstract data type, 259 
Abstraction, 4 
for message-passing, 46 
of hardware, 10, 187, 194 
for telephony, 291, 301 
Accomplishing effects, 175, 178 
Acquaintances, 279 
Actors, 306 
Algorithms 
CMatch, 42, 47 
Kruskal, 92, 305 
congruence closure, 235, 238 
find, 99, 240 
merge sort, 191 
sequence alignment, 257 
union, 99, 240 
Alignment algorithm, 257 
Analysis of programs, 219, 263 
Anti-message, 275, 285 
Applications, 11 
biology, 253 
formal verification, 235 
simulation, 273 
telephony, 289 
Arguments, 19 
Arithmetic, 26 
Arithmetic comparison, 308 
Assign, 81 
Assignment, 25, 30 
Assignment process, 25 
Atomic transaction, 128 
Atomic updates, 84, 131 
Avoiding copying, 110 
Avoiding overflow, 70 


Basic operations, 29 
assignment, 30 
guard execution, 35 
matching, 31 

Binary trees, 17, 111 

Blackboards, 83, 131 

Body of rule, 21 

Bounded Buffers, 69, 124 

Broadcast, 127 

Busy waiting, 43 


C code, 6, 11, 12, 169 
CMatch algorithm, 42, 47 
Case studies, 11 
Catch-all rule, 299 
Change of process state, 22 
Circular references & terms 31 
Code mapping, 166 
Code modules, 162 
Code volume, 303 
Commit operator (| ), 21, 219 
Communication, 7 
Communication channel, 9, 21, 56, 
96 

Communication protocols, 56 

see Programming techniques 
Computation, 47 

successful, 47 

suspending, 47 
Concurrency control, 128, 306 
Concurrent logic programming, 291, 

305 | 

Condition synchronization, 120, 306 
Congruence closure, 235, 238 
Constraining concurrency, 206 
Controlling message generation, 70 


319 


320 


Control, 308 
Critical section, 249 
Critical subsequences, 257 


Data decomposition, 211 
Data structures, 91 
accumulator, 38, 39, 48, 197 
blackboards, 83 
built-in, 15 
design constraints, 107 
design principles, 113 
design to avoid copying, 110 
hash tables, 114, 171 
heap, 114 
sets, 106 
to represent programs, 220 
trees, 17, 108, 171, 220 
Data-dependencies, 134, 261, 263 
Data-flow synchronization, 34 
see matching 
Data-oriented design, 105, 113 
Databases, distributed, 128 
Databases, 176, 178, 254, 261 
Deadlock, 144, 250, 306 
Declarations, multi-lingual programs, 
174, 180 
Decomposition, parallel execution, 169, 
187 
by data, 211 
by function, 190, 192 
Destructuring of data, 33 
see matching 
Device independence, 187 
Dictionary, 133, 244 
Difference lists, 73, 119, 121, 132, 140, 
204, 300, 305 
Direction annotations, 188, 194 
Dirichlet problem, 210 
Discard message, 286 
Discrete event simulation, 274 
Distributed databases, 128 
Distributor, 100 
Divide-and-conquer strategy, 263 
Domain Isolation, 280 
Dynamic programming, 256 


INDEX 


Empty lists, 17 
Encapsulation 

in modules, 8 

in program design, 5 

of common components, 152 

of data structures, 84, 120 

of databases, 182 

physical devices, 120 
Equivalence class, 237 
Equivalence of program & data, 219 
Erlang, 291 
Error arguments, 175 
Error recovery, 293 
Estimation function, 140 
Event queue, 274, 283, 284 
Example computations, 28, 44, 61 
Exportation of function, 151 
Exports statement, 159 
Extensibility, 11 


Failure, 43, 299 

Find & union algorithms, 99, 240 
Forking, 23 

Fortran code, 6, 11, 12, 169, 210 
Functional decomposition, 190, 192 


Garbage collection, 175, 215 
Genbank, 253 
Generation function, 136 
Global past, 275, 285 
Granularity, 198, 236, 270 
Grid problem, 210 
Ground, 47 
Guard execution, 35 
Guard of rule, 21 
Guard tests, 307 
data, 308 
known, 299, 308 
unknown, 142, 308 
otherwise, 308 


Hash tables, 114, 171 

Head of rule, 21 

Heap, 114 

Hypercube architecture, 11, 287 


INDEX 


Identical terms, 36 
Identical, 307, 308 
Implies operator (:— ), 21, 219 
Importation of functions, 151 
Incomplete messages, 63, 67, 98, 100, 
104, 121, 243, 280 
Indel, 255 
Infix operators, 219, 311 
Information hiding, 5 
Integrals, 196 
Integration of existing code, 6, 169, 
210, 259 
Interface builder, 173, 180 
Interface declaration, 174, 180 
Interfaces, in multi-lingual programs, 
172 
Interfaces, to modules, 8 
Intermodule call, 159 
Interpreters, 221 
for arithmetic expressions, 221 
for Strand, 42, 222 
performance of, 221 
tracing, 225, 231 


Kruskal’s algorithm, 92, 305 


Language dependent features, 174 
Length, 51 

Lexical ordering primitives, 112, 308 
Libraries, 207 

Lightweight processes, 292 

Link process, 81, 231 

Lists, 16, 17 

Load balancing, 201, 205, 263, 282 
Local time, 286 

Locality, 282 

Locking, 250 


Macro packages, 173 

Manager-worker strategy, 201, 207, 
263 

Mapping, star to a ring, 201 

Match algorithm, 34 

Matching, 24, 31, 308 

Matrix multiplication, 173, 195 


321 


Memory performance, 215 
Merge sort, 191 
Merger, 67, 84, 102, 119, 140, 305 
Mesh architecture, 11, 193 
Message order, 293 
Metaprogramming, 219, 221, 227, 306 
Minimum-cost spanning tree, 92 
Modular decomposition, 5, 151, 155 
Modules 

export list, 8 

in program design, 5, 151 

in Strand, 159, 162 

interface definitions, 8, 151, 157 
Module server, 163 
Monitors, 120, 144, 181, 306 
Multi-lingual programming, 169, 210, 

259 

Multiple readers and writers, 84 
Mutable structures, 114, 176, 215, 235 
Mutual exclusion, 120, 144, 250, 306 


Nondeterministic process selection, 43 
Numbers, 15 


Object-oriented programming, 291 
Operational model, 47 
Operators, 311 
commit (|), 21, 219 
difference lists (/), 74, 77 
implies (:— ), 21, 219 
infix, 219 
intermodule call, 159 
Otherwise, 298 
Overflow, 69 
Overview of the book, 11 


Parlog, 291 
Partial orderings, 6 
Partitioning, 169, 187, 204 
Performance studies, 269 
Perpetual process, 240, 279 
Pin, 257 
best, 257 
clean, 257 
Pointers, in user-defined data types, 
175, 178 


322 


Postfix operators, 311 
Predefined operators, 311 
Predefined processes, 21, 51, 309 
arithmetic, 26, 311 
assignment, 25 
merger, 84, 311 
run, 162, 310 
term manipulation, 310 
Prefix operators, 311 
Principles for interface design, 175 
Process mapping, 187 
ring, 188, 197, 201, 264, 282 
torus, 193, 198 
using quotas, 282 
Process pool, 27 
Process structures, 91 
blackboards, 83 
dictionary, 133, 244 
distributor, 100 
module server, 163 
queue, 283, 284 
sets, 99 
star, 201 
trees, 99 
Process, 9 
arguments, 19 
change of state, 22 
definition, 19, 22 
execution, 27, 44 
failure, 43 
forking, 23 
termination, 22 
types of action, 20, 22 
Process-oriented design, 95, 113 
Producer-consumers, 104, 121, 58 
Program design methodology, 305 
for multi-lingual programs, 171 
backtracking in, 94, 103 


in Strand, 8 

modular decomposition, 5, 151, 
155 

stepwise refinement, 4, 94, 152, 
158 


Programming in the large, 151 
Programming styles, 91, 113 


INDEX 


data-oriented, 105 
process-oriented, 95 
Programming techniques, 55 
blackboards, 83, 131 
bounded buffers, 69, 124 
difference lists, 73, 119, 121, 132, 
140, 204, 300, 305 
incomplete messages, 63, 67, 98, 
100, 104, 121, 243, 280 
monitor, 120 
producer-consumers, 58, 104, 121, 
242, 280, 299 
short-circuits, 79, 138, 205, 229, 
250, 299, 305 
Prolog, 291, 305 


Queue, 283, 284 
Quotas, for process mapping, 282 


Ramp-up/ramp-down, 207, 270 

Receiving messages, 46, 56 

Recursion, 38, 59, 97 

Relational language, 305 

Replying to messages, 64 

Representation of programs, 220, 227 

Reuse of code, 6, 11, 169, 210 

Ring mappings, 188, 197, 201, 264, 
282 

Ring virtual machine, 188 

Roll back, 275 

Routers, 285 

Rule copying, 44 

Rule form, 223, 227 

Rules, 21 

Run, 162 


SDL, 293 

Scaleability, 187 

Scheduling, 134 

Search, 136, 140, 199, 261 
bounded, 139 
depth-first, 137, 206 
exhaustive, 137 
heuristic, 139 

Semantics, 305 


INDEX 


Sending messages, 46, 56 
Sequence, of RNA, 259 
Sequencing, 81, 103, 176, 250, 298 
Sequential languages, 6, 11, 169 
Sets, 99, 106 
Shared memory machines, 11, 269 
Short-circuit, 79, 138, 205, 229, 250, 
299, 305 
Side effects, 292 
Simulation, 273 
Single-assignment rule, 31, 57, 114, 
175, 176, 243 

Spawning of processes, 23 
Standard ordering, 307 
Star virtual machines, 201 
Starvation, 144, 250, 306 
Static distribution, 282 
Stepwise refinement, 4, 94, 152, 158 
Stopping condition, 39, 59, 97 
Storing pointers, 175, 178 
Streams, 58 
Strings, 15 
Structures, 16 
Successful computation, 47 
Sun computers, 11 
Support for integration, 172 
Suspending computation, 47 
Suspension, 34, 43 
Synchronization, 7, 34, 120, 103, 249 
Syntax 

of modules, 159 

of process mapping, 188 

of processes, 19 

of rules, 21 

of terms, 15 
System dependent features, 174 


Teamwork, 3 

Telephony, 289 

Term comparison, 308 

Term manipulation, 310 
Termination detection, 79, 281, 283 
Termination function, 136 
Termination of assignment, 81 
Termination of processes, 22 


323 


Terms, 15 
Test operations, 21, 35 
arithmetic comparison, 36, 308 
control, 308 
matching, 308 
term comparison, 36, 308 
type-checking, 36, 308 
Testing predicates, 82 
Testing values, 307 
Time warp, 275, 285 
Time-dependency, 177 
Timeout, 296 
Torus mappings, 193, 198 
Torus virtual machine, 193 
Total ordering, 112 
Tracing interpreter, 225, 231 
Transformations, source-to-source, 227 
Erlang to Strand, 298 
for process mapping, 263, 265 
to add short-circuit, 229 
to rule form, 227 
Transition rules, 47 
Transputers, 11 
Tree virtual machine, 217 
Trees, 17, 99, 111, 108, 171, 220 
Triangle problem, 199 
Tuples, 16, 17 
Two-stage commit, 128 
Type checking, 308 


User-defined data types, 171, 175, 179, 
212, 260 

User-defined operations, 171, 179, 212, 
260 


Variables, 16 

as channels, 9, 21, 46, 56 

single-assignment property, 31 
Vector processors, 11, 172 
Verification, 235 
Virtual machines, 10 

ring, 188 

star, 201 

torus, 193 

tree, 217 


es Tms akas aiie mn ami ii n alm aims 


STRAND ™. NEW CONCEPTS IN 
PARALLEL PROGRAMMING 
lan Foster/Stephen Taylor 


A commercially supported 
implementation of Strand is now 
available: STRAND88. 


If you would like further information 
on STRAND88, please clip this 
coupon and return to: 


USA and CANADA 


| 

| 

| 

| 

| 

| 

| 

| | would like more information on STRAND88. 
| 

Strand Software Technologies Inc. | C STRAND88 Information Pack 

| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 


Please send me: 


P.O. Box 5639 

Beaverton, OR 97007-9998 NAME 

USA ADDRESS 

EUROPE 

Strand Software Technologies CITY 

Division STATE ZIP 
Al Limited 

Greycaine Road PHONE 

Watford 

Hertfordshire, WD2 4JP 


England 


—_—- ——— — — — — | a a a a a a a aa 


Strand 


New Concepts in 
Parallel Programming 


Ian Foster - Stephen Taylor 


STRAND SOFTWARE TECHNOLOGIES INC. 


Strand is the first of a new generation of soft- 
ware languages specifically designed for parallel 
programming. STRAND88 is the first commercial 
implementation of the Strand language. 


Programmers can now build portable, parallel 
applications. STRAND88 also permits existing 
sequential applications, written in languages 
such as C and Fortran, to be utilized with 





Jan 


minimum reprogramming effort. x 7 
STRAND SOFTWA 

STRAND88 is available from Strand Software 

Technologies Inc. and authorized distributors. 


This book provides a self-contained introduction to concurrent logic pro- 
gramming techniques using a commercially supported language, Strand. 
The presentation provides a practitioner’s introduction to a variety of ideas - 
that have previously been available only through the research literature; it 
is suitable for both students and industrial programmers alike. 


The book is divided into three parts: 


Part 1 explains basic. concepts. It introduces Strand and six fundamental 
programming techniques. Two different stylistic approaches to programming 
are then described and the solution to a variety of programming problems 
are developed. + 


Part 2 describes advanced techniques. This part introduces concepts for 
programming in the large, re-using sequential code, process mapping and 
metaprogramming. 

Part 3 consists of case studies. These studies have been contributed by 
academic and industrial collaborators. Each describes the use of Strand in 
a particular application. 


PRENTICE HALL 
Englewood Cliffs, N.J. 07632 





